分类目录归档:知识问答

Python 比较两个时间序列在图形上是否相似

比较两个时间序列在图形上是否相似,可以通过以下方法:

  1. 可视化比较:将两个时间序列绘制在同一张图上,并使用相同的比例和轴标签进行比较。可以观察它们的趋势、峰值和谷值等特征,从而进行比较。
  2. 峰值和谷值比较:通过比较两个时间序列中的峰值和谷值来进行比较。可以比较它们的幅度和位置。
  3. 相关性分析:计算两个时间序列之间的相关系数,从而确定它们是否存在线性关系。如果它们的相关系数接近1,则它们趋势相似。
  4. 非线性方法:使用非线性方法来比较两个时间序列,如动态时间规整、小波变换等。这些方法可以帮助捕捉两个时间序列之间的相似性。

需要注意的是,图形上的相似性并不能完全代表两个时间序列之间的相似性,因为同一个图形可以对应着不同的时间序列。因此,在进行时间序列的比较时,需要综合考虑多个方面的信息。

1. 使用Matplotlib可视化比较两个时间序列:

import matplotlib.pyplot as plt

# 生成时间序列数据
x = [1, 2, 3, 4, 5]
y1 = [10, 15, 13, 17, 20]
y2 = [8, 12, 14, 18, 22]

# 绘制两个时间序列的折线图
plt.plot(x, y1, label='y1')
plt.plot(x, y2, label='y2')

# 设置图形属性
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Comparison of two time series')
plt.legend()

# 显示图形
plt.show()

2. 计算两个时间序列的相关系数:

import numpy as np

# 生成时间序列数据
x = [1, 2, 3, 4, 5]
y1 = [10, 15, 13, 17, 20]
y2 = [8, 12, 14, 18, 22]

# 计算相关系数
corr = np.corrcoef(y1, y2)[0, 1]

# 输出结果
print('Correlation coefficient:', corr)

3.使用Python实现动态时间规整算法(DTW):

import numpy as np

# 生成时间序列数据
x = [1, 2, 3, 4, 5]
y1 = [10, 15, 13, 17, 20]
y2 = [8, 12, 14, 18, 22]

# 动态时间规整算法
def dtw_distance(ts_a, ts_b, d=lambda x, y: abs(x - y)):
    DTW = {}

    # 初始化边界条件
    for i in range(len(ts_a)):
        DTW[(i, -1)] = float('inf')
    for i in range(len(ts_b)):
        DTW[(-1, i)] = float('inf')
    DTW[(-1, -1)] = 0

    # 计算DTW矩阵
    for i in range(len(ts_a)):
        for j in range(len(ts_b)):
            cost = d(ts_a[i], ts_b[j])
            DTW[(i, j)] = cost + min(DTW[(i-1, j)], DTW[(i, j-1)], DTW[(i-1, j-1)])

    # 返回DTW距离
    return DTW[len(ts_a)-1, len(ts_b)-1]

# 计算两个时间序列之间的DTW距离
dtw_dist = dtw_distance(y1, y2)

# 输出结果
print('DTW distance:', dtw_dist)

如果我的数据是一个Nx(20×14)的二位数组,如何转换为(N, 1, 20, 14)

如果你的数据是一个大小为(N, 20×14)的二维数组,你可以使用NumPy库中的reshape方法将其转换为一个大小为(N, 1, 20, 14)的4D张量。具体来说,你可以使用以下代码:

import numpy as np

# 假设你的数据是一个大小为(N, 20x14)的二维数组,名为data
N, H, W = data.shape[0], 20, 14

# 将数据的形状从(N, 20x14)转换为(N, 1, 20, 14)
data = data.reshape(N, 1, H, W)

# 将NumPy数组转换为PyTorch张量
data = torch.from_numpy(data)

在这里,我们首先将输入数据的高度和宽度设为20和14。然后,我们使用NumPy的reshape方法将数据从一个大小为(N, 20×14)的二维数组转换为一个大小为(N, 1, 20, 14)的4D张量。最后,我们将NumPy数组转换为PyTorch张量。

需要注意的是,这里的通道数是1,因为输入数据中只有一个通道。如果你的输入数据中有多个通道,你需要将通道数设为相应的值。

保守公式:简单易行的量化投资

我们提出了一个保守的投资公式,它可以让投资者轻松地进行量化投资。使用三个简单的投资标准,可以用一个由100只流动性股票组成的单一投资策略捕获多种不同的因子溢价。保守公式只使用过去的收益和净派息率,这意味着不需要会计或其他数据来源。尽管很简单,但这个策略同时给予了低贝塔、价值、质量和动量等众所周知的因子正向的暴露。

该策略的收益在时间上和国际股市上都是稳定的,并且存在于美国小/中盘股中。

总之,这个简单的公式可以被主动投资者用作直接从半个世纪的学术洞察中获利的一种方式,通过应用一个简单的投资公式。

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3145152

教你如何用Python自动发wordpress文章

1.准备

开始之前,你要确保Python和pip已经成功安装在电脑上,如果没有,请访问这篇文章:超详细Python安装指南 进行安装。

(可选1) 如果你用Python的目的是数据分析,可以直接安装Anaconda:Python数据分析与挖掘好帮手—Anaconda,它内置了Python和pip.

(可选2) 此外,推荐大家用VSCode编辑器来编写小型Python项目:Python 编程的最好搭档—VSCode 详细指南

Windows环境下打开Cmd(开始—运行—CMD),苹果系统环境下请打开Terminal(command+空格输入Terminal),输入命令安装依赖:

pip install xmlrpc

2.编写代码

import xmlrpc.client

# WordPress的XML-RPC API URL
url = 'https://yourwebsite.com/xmlrpc.php'

# WordPress登录凭证
username = 'your_username'
password = 'your_password'

# 创建XML-RPC客户端对象
wp = xmlrpc.client.ServerProxy(url)

# 创建文章的字典对象
post = {
    'post_title': 'My New Post',
    'post_content': 'This is the content of my new post.',
    'post_status': 'publish'
}

# 使用XML-RPC API发布文章
post_id = wp.wp.newPost('', username, password, post)

# 输出新发布文章的ID
print('New post published with ID:', post_id)

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!

给作者打赏,选择打赏金额
¥1¥5¥10¥20¥50¥100¥200 自定义

​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

教你如何部署FastApi服务到Linux服务器上

要将 FastAPI 部署到 CentOS 上,您可以按照以下步骤进行操作:

1.安装 Python 和 pip

首先,您需要在 CentOS 上安装 Python 和 pip。您可以使用以下命令安装它们:

sudo yum install python3 python3-pip

2.安装 FastAPI 和 uvicorn

然后,您需要安装 FastAPI 和 uvicorn。您可以使用以下命令安装它们:

sudo pip3 install fastapi
sudo pip3 install uvicorn

3.创建应用程序

创建您的 FastAPI 应用程序。您可以在本地编写代码并将其上传到服务器,或者您可以在 CentOS 上直接创建代码文件。

4.运行应用程序

使用 uvicorn 运行您的 FastAPI 应用程序。例如,如果您的应用程序文件名为 main.py,您可以使用以下命令运行它:

uvicorn main:app --host 0.0.0.0 --port 8000

这将在 CentOS 上启动 FastAPI 应用程序,并将其绑定到 0.0.0.0:8000

5.设置防火墙规则

如果您的 CentOS 系统上启用了防火墙,则您需要添加一个防火墙规则以允许对 FastAPI 应用程序的流量。您可以使用以下命令添加规则:

sudo firewall-cmd --permanent --add-port=8000/tcp
sudo firewall-cmd --reload

这将允许来自 TCP 端口 8000 的流量通过防火墙。

现在,您已经成功地将 FastAPI 应用程序部署到 CentOS 上,并可以通过 IP 地址或域名访问它。

chatGPT的全称?是否公开了?在哪里获取?

ChatGPT的全称是“Conversational Generative Pre-trained Transformer”,是一种基于Transformer模型的预训练语言模型,由OpenAI团队开发。它被广泛应用于自然语言处理领域,包括文本生成、对话系统、机器翻译等方面。

公开

是的,OpenAI公开了多个版本的GPT模型的预训练权重,包括GPT-1、GPT-2和GPT-3。这些预训练权重可以通过OpenAI的API或者Hugging Face等第三方库进行获取和使用。

此外,OpenAI还提供了一些开源的工具和代码库,包括用于训练和评估语言模型的工具,以及用于文本生成、对话系统等应用的示例代码。这些工具和代码库可以帮助开发者更加方便地使用和定制GPT模型,加速模型应用的开发和部署。

获取

GPT模型的预训练权重可以从多个来源获取,包括:

  1. Hugging Face的模型库:Hugging Face是一个NLP模型和工具的社区,提供了大量的预训练模型和工具,包括GPT系列模型的预训练权重。你可以在https://huggingface.co/models查看可用的GPT模型,并下载对应的权重。
  2. OpenAI的模型库:OpenAI是GPT模型的开发者之一,他们提供了多个版本的GPT预训练权重,包括GPT、GPT-2和GPT-3等。你可以在https://beta.openai.com/models/gpt查看并下载预训练权重。
  3. TensorFlow官方模型库:如果你使用TensorFlow框架,可以从TensorFlow官方模型库中下载GPT模型的预训练权重。你可以在https://tfhub.dev/s?q=gpt查找可用的GPT模型,并下载对应的权重。

需要注意的是,这些预训练权重通常很大,下载和加载可能需要较长的时间和更多的计算资源。

gpt-2如何使用?gpt-2和BERT的区别是?

gpt-2如何使用?

gpt2 模型:https://github.com/openai/gpt-2

GPT-2是一个非常强大的语言生成模型,可以用于各种自然语言处理任务,如文本生成、语言理解、机器翻译、情感分析等等。下面是一些使用GPT-2的基本步骤:

  1. 安装GPT-2模型:在使用GPT-2之前,您需要下载和安装模型。您可以从OpenAI官方网站下载模型,或使用一些已经封装好的Python库来获取模型。
  2. 准备数据:在使用GPT-2生成文本之前,您需要准备一个语料库。这个语料库可以是您自己的文本数据集,也可以是从互联网上抓取的数据集。
  3. 运行GPT-2:一旦您准备好了数据和模型,您就可以开始运行GPT-2了。根据您的任务,您可能需要微调模型或使用不同的超参数来优化生成的文本。
  4. 生成文本:一旦您的模型已经训练好了,您可以使用它来生成文本。您可以通过调用模型的API或使用一些现成的工具来生成文本。

需要注意的是,由于GPT-2是一个非常大的模型,它需要大量的计算资源来训练和使用。因此,在使用GPT-2之前,您需要确保您的计算资源足够,并且您具备足够的技术知识来正确地配置和运行模型。

gpt-2和BERT的区别是?

BERT模型:https://github.com/google-research/bert

GPT-2和BERT是两个基于深度学习的自然语言处理模型,它们在许多方面都有所不同。下面是一些主要的区别:

  1. 任务类型:GPT-2是一个生成式模型,它可以生成连续的文本序列,比如文章、故事或对话。而BERT是一个判别式模型,它可以对输入的文本进行分类、回归等任务。
  2. 输入方式:GPT-2的输入是一个上下文序列,模型会根据这个序列生成一个下一个单词或词组,可以被用于语言模型、文本生成等任务。BERT的输入则是一个完整的文本,模型会输出这个文本的某种特征表示,可以被用于文本分类、情感分析等任务。
  3. 训练数据:GPT-2使用的训练数据是从互联网上采集的大量文本数据,而BERT使用的是一些特定的任务数据集,例如阅读理解、问答等任务。
  4. 架构:GPT-2采用的是自回归架构(Autoregressive Architecture),即模型会根据之前的输入生成下一个单词,一步步生成整个文本序列。BERT采用的是编码器-解码器架构(Encoder-Decoder Architecture),即模型会将输入编码成一个表示,然后解码为输出。
  5. 预训练目标:GPT-2的预训练目标是使用未来的单词来预测当前的单词,这被称为掩码语言建模(Masked Language Modeling)。BERT的预训练目标则包括两种任务:掩码语言建模和下一句预测(Next Sentence Prediction)。

总之,GPT-2和BERT在任务类型、输入方式、训练数据、架构和预训练目标等方面都有所不同,具体使用哪个模型取决于您的任务和数据。

Python SciPy是否需要BLAS?

问题:Python SciPy是否需要BLAS?

numpy.distutils.system_info.BlasNotFoundError: 
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.

我需要从该站点下载哪个tar?

我已经尝试过fortrans,但是一直出现此错误(明显地设置了环境变量之后)。

numpy.distutils.system_info.BlasNotFoundError: 
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.

Which tar do I need to download off this site?

I’ve tried the fortrans, but I keep getting this error (after setting the environment variable obviously).


回答 0

SciPy的网页用来提供构建和安装说明,但说明现在依靠操作系统二进制分发。要在没有预编译所需库软件包的操作系统上构建SciPy(和NumPy),必须先构建然后静态链接到Fortran库BLASLAPACK

mkdir -p ~/src/
cd ~/src/
wget http://www.netlib.org/blas/blas.tgz
tar xzf blas.tgz
cd BLAS-*

## NOTE: The selected Fortran compiler must be consistent for BLAS, LAPACK, NumPy, and SciPy.
## For GNU compiler on 32-bit systems:
#g77 -O2 -fno-second-underscore -c *.f                     # with g77
#gfortran -O2 -std=legacy -fno-second-underscore -c *.f    # with gfortran
## OR for GNU compiler on 64-bit systems:
#g77 -O3 -m64 -fno-second-underscore -fPIC -c *.f                     # with g77
gfortran -O3 -std=legacy -m64 -fno-second-underscore -fPIC -c *.f    # with gfortran
## OR for Intel compiler:
#ifort -FI -w90 -w95 -cm -O3 -unroll -c *.f

# Continue below irrespective of compiler:
ar r libfblas.a *.o
ranlib libfblas.a
rm -rf *.o
export BLAS=~/src/BLAS-*/libfblas.a

仅执行五个g77 / gfortran / ifort命令之一。我已注释掉所有内容,但我使用的是gfortran。随后的LAPACK安装需要一个Fortran 90编译器,并且由于两个安装都应使用相同的Fortran编译器,因此g77不应用于BLAS。

接下来,您需要安装LAPACK东西。SciPy网页的说明在这里也对我有所帮助,但我必须对其进行修改以适合我的环境:

mkdir -p ~/src
cd ~/src/
wget http://www.netlib.org/lapack/lapack.tgz
tar xzf lapack.tgz
cd lapack-*/
cp INSTALL/make.inc.gfortran make.inc          # On Linux with lapack-3.2.1 or newer
make lapacklib
make clean
export LAPACK=~/src/lapack-*/liblapack.a

2015年9月3日更新:今天验证了一些评论(感谢所有):运行之前,make lapacklib编辑make.inc文件-fPIC并向OPTSNOOPT设置添加选项。如果您使用的是64位体系结构或要编译为64位体系结构,请同时添加-m64。重要的是,在将这些选项设置为相同值的情况下编译BLAS和LAPACK。如果您忘记了,-fPICSciPy实际上会给您有关符号丢失的错误,并建议您使用此开关。make.inc我的设置中的特定部分如下所示:

FORTRAN  = gfortran 
OPTS     = -O2 -frecursive -fPIC -m64
DRVOPTS  = $(OPTS)
NOOPT    = -O0 -frecursive -fPIC -m64
LOADER   = gfortran

在旧机器(例如RedHat 5)上,gfortran可能安装在旧版本(例如4.1.2)中,并且不理解option -frecursivemake.inc在这种情况下,只需将其从文件中删除即可。

Makefile的lapack测试目标在我的设置中失败,因为它找不到blas库。如果您周全,则可以将blas库临时移至指定位置以测试lapack。我是一个懒惰的人,所以我相信开发人员可以使其工作并仅在SciPy中进行验证。

The SciPy webpage used to provide build and installation instructions, but the instructions there now rely on OS binary distributions. To build SciPy (and NumPy) on operating systems without precompiled packages of the required libraries, you must build and then statically link to the Fortran libraries BLAS and LAPACK:

mkdir -p ~/src/
cd ~/src/
wget http://www.netlib.org/blas/blas.tgz
tar xzf blas.tgz
cd BLAS-*

## NOTE: The selected Fortran compiler must be consistent for BLAS, LAPACK, NumPy, and SciPy.
## For GNU compiler on 32-bit systems:
#g77 -O2 -fno-second-underscore -c *.f                     # with g77
#gfortran -O2 -std=legacy -fno-second-underscore -c *.f    # with gfortran
## OR for GNU compiler on 64-bit systems:
#g77 -O3 -m64 -fno-second-underscore -fPIC -c *.f                     # with g77
gfortran -O3 -std=legacy -m64 -fno-second-underscore -fPIC -c *.f    # with gfortran
## OR for Intel compiler:
#ifort -FI -w90 -w95 -cm -O3 -unroll -c *.f

# Continue below irrespective of compiler:
ar r libfblas.a *.o
ranlib libfblas.a
rm -rf *.o
export BLAS=~/src/BLAS-*/libfblas.a

Execute only one of the five g77/gfortran/ifort commands. I have commented out all, but the gfortran which I use. The subsequent LAPACK installation requires a Fortran 90 compiler, and since both installs should use the same Fortran compiler, g77 should not be used for BLAS.

Next, you’ll need to install the LAPACK stuff. The SciPy webpage’s instructions helped me here as well, but I had to modify them to suit my environment:

mkdir -p ~/src
cd ~/src/
wget http://www.netlib.org/lapack/lapack.tgz
tar xzf lapack.tgz
cd lapack-*/
cp INSTALL/make.inc.gfortran make.inc          # On Linux with lapack-3.2.1 or newer
make lapacklib
make clean
export LAPACK=~/src/lapack-*/liblapack.a

Update on 3-Sep-2015: Verified some comments today (thanks to all): Before running make lapacklib edit the make.inc file and add -fPIC option to OPTS and NOOPT settings. If you are on a 64bit architecture or want to compile for one, also add -m64. It is important that BLAS and LAPACK are compiled with these options set to the same values. If you forget the -fPIC SciPy will actually give you an error about missing symbols and will recommend this switch. The specific section of make.inc looks like this in my setup:

FORTRAN  = gfortran 
OPTS     = -O2 -frecursive -fPIC -m64
DRVOPTS  = $(OPTS)
NOOPT    = -O0 -frecursive -fPIC -m64
LOADER   = gfortran

On old machines (e.g. RedHat 5), gfortran might be installed in an older version (e.g. 4.1.2) and does not understand option -frecursive. Simply remove it from the make.inc file in such cases.

The lapack test target of the Makefile fails in my setup because it cannot find the blas libraries. If you are thorough you can temporarily move the blas library to the specified location to test the lapack. I’m a lazy person, so I trust the devs to have it working and verify only in SciPy.


回答 1

如果您需要使用最新版本的SciPy而非打包的版本,而无需经历构建BLAS和LAPACK的麻烦,则可以按照以下过程进行操作。

从存储库安装线性代数库(对于Ubuntu),

sudo apt-get install gfortran libopenblas-dev liblapack-dev

然后安装SciPy(在下载SciPy源代码之后):python setup.py install

pip install scipy

视情况可以是。

If you need to use the latest versions of SciPy rather than the packaged version, without going through the hassle of building BLAS and LAPACK, you can follow the below procedure.

Install linear algebra libraries from repository (for Ubuntu),

sudo apt-get install gfortran libopenblas-dev liblapack-dev

Then install SciPy, (after downloading the SciPy source): python setup.py install or

pip install scipy

As the case may be.


回答 2

在Fedora上,这有效:

 yum install lapack lapack-devel blas blas-devel
 pip install numpy
 pip install scipy

请记住除了安装“ blas ”和“ lapack ”之外,还要安装“ lapack-devel ”和“ blas-devel ”,否则,您将得到所提到的错误或“ numpy.distutils.system_info。LapackNotFoundError ”错误。

On Fedora, this works:

 yum install lapack lapack-devel blas blas-devel
 pip install numpy
 pip install scipy

Remember to install ‘lapack-devel‘ and ‘blas-devel‘ in addition to ‘blas’ and ‘lapack’ otherwise you’ll get the error you mentioned or the “numpy.distutils.system_info.LapackNotFoundError” error.


回答 3

我猜您在谈论在Ubuntu中进行安装。只需使用:

apt-get install python-numpy python-scipy

那也应该照顾BLAS库的编译。否则,编译BLAS库非常困难。

I guess you are talking about installation in Ubuntu. Just use:

apt-get install python-numpy python-scipy

That should take care of the BLAS libraries compiling as well. Else, compiling the BLAS libraries is very difficult.


回答 4

对于Windows用户,Chris提供了一个不错的二进制程序包(警告:下载量很大,为191 MB):

For Windows users there is a nice binary package by Chris (warning: it’s a pretty large download, 191 MB):


回答 5

遵循“ cfi”给出的说明对我有用,尽管它们遗漏了一些您可能需要的部分:

1)解压缩后的lapack目录可能称为lapack-XY(某些版本号),因此您可以将其重命名为LAPACK。

cd ~/src
mv lapack-[tab] LAPACK

2)在该目录中,您可能需要执行以下操作:

cd ~/src/LAPACK 
cp lapack_LINUX.a libflapack.a

Following the instructions given by ‘cfi’ works for me, although there are a few pieces they left out that you might need:

1) Your lapack directory, after unzipping, may be called lapack-X-Y (some version number), so you can just rename that to LAPACK.

cd ~/src
mv lapack-[tab] LAPACK

2) In that directory, you may need to do:

cd ~/src/LAPACK 
cp lapack_LINUX.a libflapack.a

回答 6

尝试使用

sudo apt-get install python3-scipy

Try using

sudo apt-get install python3-scipy

如何在Python中找到线程ID

问题:如何在Python中找到线程ID

我有一个多线程Python程序和一个实用程序函数, writeLog(message),该写出时间戳记和消息。不幸的是,结果日志文件没有给出哪个线程正在生成哪个消息的指示。

我希望writeLog()能够在消息中添加一些内容,以标识哪个线程正在调用它。显然,我可以使线程将这些信息传递进来,但这将需要更多工作。是否有一些os.getpid()我可以使用的等效线程?

I have a multi-threading Python program, and a utility function, writeLog(message), that writes out a timestamp followed by the message. Unfortunately, the resultant log file gives no indication of which thread is generating which message.

I would like writeLog() to be able to add something to the message to identify which thread is calling it. Obviously I could just make the threads pass this information in, but that would be a lot more work. Is there some thread equivalent of os.getpid() that I could use?


回答 0

threading.get_ident(),或threading.current_thread().ident(或(threading.currentThread().ident对于python <2.6)。

threading.get_ident() works, or threading.current_thread().ident (or threading.currentThread().ident for Python < 2.6).


回答 1

使用日志记录模块,您可以在每个日志条目中自动添加当前线程标识符。只需在记录器格式字符串中使用以下LogRecord映射键之一:

%(thread)d: 线程ID(如果有)。

%(threadName)s: 线程名称(如果有)。

并使用它设置默认处理程序:

logging.basicConfig(format="%(threadName)s:%(message)s")

Using the logging module you can automatically add the current thread identifier in each log entry. Just use one of these LogRecord mapping keys in your logger format string:

%(thread)d : Thread ID (if available).

%(threadName)s : Thread name (if available).

and set up your default handler with it:

logging.basicConfig(format="%(threadName)s:%(message)s")

回答 2

thread.get_ident()函数在Linux上返回一个长整数。这实际上不是线程ID。

我使用这种方法来真正获取Linux上的线程ID:

import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

The thread.get_ident() function returns a long integer on Linux. It’s not really a thread id.

I use this method to really get the thread id on Linux:

import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

回答 3


回答 4

我看到了这样的线程ID的示例:

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID
        ...

线程模块文档列表name属性,以及:

...

A thread has a name. 
The name can be passed to the constructor, 
and read or changed through the name attribute.

...

Thread.name

A string used for identification purposes only. 
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.

I saw examples of thread IDs like this:

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID
        ...

The threading module docs lists name attribute as well:

...

A thread has a name. 
The name can be passed to the constructor, 
and read or changed through the name attribute.

...

Thread.name

A string used for identification purposes only. 
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.

回答 5

您可以获得当前正在运行的线程的标识。如果当前线程结束,则该标识可以重用于其他线程。

创建线程实例时,将为该线程隐式指定一个名称,即模式:线程号

名称没有意义,名称不必唯一。所有正在运行的线程的标识都是唯一的。

import threading


def worker():
    print(threading.current_thread().name)
    print(threading.get_ident())


threading.Thread(target=worker).start()
threading.Thread(target=worker, name='foo').start()

函数threading.current_thread()返回当前正在运行的线程。该对象保存线程的全部信息。

You can get the ident of the current running thread. The ident could be reused for other threads, if the current thread ends.

When you crate an instance of Thread, a name is given implicit to the thread, which is the pattern: Thread-number

The name has no meaning and the name don’t have to be unique. The ident of all running threads is unique.

import threading


def worker():
    print(threading.current_thread().name)
    print(threading.get_ident())


threading.Thread(target=worker).start()
threading.Thread(target=worker, name='foo').start()

The function threading.current_thread() returns the current running thread. This object holds the whole information of the thread.


回答 6

我在Python中创建了多个线程,打印了线程对象,并使用ident变量打印了id 。我看到所有ID都一样:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

I created multiple threads in Python, I printed the thread objects, and I printed the id using the ident variable. I see all the ids are same:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

回答 7

与@brucexin类似,我需要获取操作系统级别的线程标识符(!= thread.get_ident()),并使用如下所示的内容来不依赖于特定的数字并且仅使用amd64:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)                                                                  

---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos

...

print 'my tid: %d' % xos.gettid()

这取决于Cython。

Similarly to @brucexin I needed to get OS-level thread identifier (which != thread.get_ident()) and use something like below not to depend on particular numbers and being amd64-only:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)                                                                  

and

---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos

...

print 'my tid: %d' % xos.gettid()

this depends on Cython though.


使用csv模块从csv文件中读取特定列?

问题:使用csv模块从csv文件中读取特定列?

我正在尝试解析一个csv文件,并仅从特定列中提取数据。

范例csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

我想只捕获特定的列,说IDNameZipPhone

我看过的代码使我相信我可以通过其对应的编号来调用特定的列,即:Name将使用对应2并遍历每一行将row[2]产生列2中的所有项。只有这样,它才不会。

到目前为止,这是我所做的:

import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)
    csvfile.seek(0)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content

并且我希望这只会打印出我想要的每一行的特定列,除非不是,我只会得到最后一列。

I’m trying to parse through a csv file and extract the data from only specific columns.

Example csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

I’m trying to capture only specific columns, say ID, Name, Zip and Phone.

Code I’ve looked at has led me to believe I can call the specific column by its corresponding number, so ie: Name would correspond to 2 and iterating through each row using row[2] would produce all the items in column 2. Only it doesn’t.

Here’s what I’ve done so far:

import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)
    csvfile.seek(0)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content

and I’m expecting that this will print out only the specific columns I want for each row except it doesn’t, I get the last column only.


回答 0

你会得到从这个代码的最后一列的唯一方法是,如果你不包括你的print语句for循环。

这很可能是代码的结尾:

for row in reader:
    content = list(row[i] for i in included_cols)
print content

您希望它是这样的:

for row in reader:
        content = list(row[i] for i in included_cols)
        print content

既然我们已经解决了您的错误,那么我想花时间向您介绍pandas模块。

Pandas在处理csv文件方面非常出色,下面的代码将是您读取csv并将整列保存到变量中所需的全部:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

因此,如果您想将列中的所有信息保存Names到变量中,则只需执行以下操作:

names = df.Names

这是一个很棒的模块,建议您研究一下。如果由于某种原因您的打印语句处于for循环状态,并且仍然仅打印出最后一列,则不应该发生,但是请让我知道我的假设是否错误。您发布的代码有很多缩进错误,因此很难知道应该在哪里。希望这对您有所帮助!

The only way you would be getting the last column from this code is if you don’t include your print statement in your for loop.

This is most likely the end of your code:

for row in reader:
    content = list(row[i] for i in included_cols)
print content

You want it to be this:

for row in reader:
        content = list(row[i] for i in included_cols)
        print content

Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.

Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:

names = df.Names

It’s a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn’t happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!


回答 1

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('file.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

print(columns['name'])
print(columns['phone'])
print(columns['street'])

带有类似的文件

name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.

将输出

>>> 
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']

或者,如果您希望对列进行数字索引:

with open('file.txt') as f:
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
print(columns[0])

>>> 
['Bob', 'James', 'Smithers']

要更改分隔符,请添加delimiter=" "适当的实例,即reader = csv.reader(f,delimiter=" ")

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('file.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

print(columns['name'])
print(columns['phone'])
print(columns['street'])

With a file like

name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.

Will output

>>> 
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']

Or alternatively if you want numerical indexing for the columns:

with open('file.txt') as f:
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
print(columns[0])

>>> 
['Bob', 'James', 'Smithers']

To change the deliminator add delimiter=" " to the appropriate instantiation, i.e reader = csv.reader(f,delimiter=" ")


回答 2

使用熊猫

import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']

在解析时丢弃不需要的列:

my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

PS:我只是以一种简单的方式来汇总别人的话。实际的答案是从这里这里

Use pandas:

import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']

Discard unneeded columns at parse time:

my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

P.S. I’m just aggregating what other’s have said in a simple manner. Actual answers are taken from here and here.


回答 3

随着熊猫,你可以使用read_csv带有usecols参数:

df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

例:

import pandas as pd
import io

s = '''
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
'''

df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
print(df)

   total_bill  day  size
0       16.99  Sun     2
1       10.34  Sun     3
2       21.01  Sun     3

With pandas you can use read_csv with usecols parameter:

df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

Example:

import pandas as pd
import io

s = '''
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
'''

df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
print(df)

   total_bill  day  size
0       16.99  Sun     2
1       10.34  Sun     3
2       21.01  Sun     3

回答 4

您可以使用numpy.loadtext(filename)。例如,如果这是您的数据库.csv

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

您想要该Name列:

import numpy as np 
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))

>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

您可以更轻松地使用genfromtext

b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

You can use numpy.loadtext(filename). For example if this is your database .csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

And you want the Name column:

import numpy as np 
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))

>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

More easily you can use genfromtext:

b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

回答 5

上下文:对于这类工作,您应该使用令人惊叹的python petl库。通过标准的csv模块“手动”执行操作,可以节省大量工作和潜在的挫败感。AFAIK,唯一仍在使用csv模块的人是尚未发现更好的工具来处理表格数据(熊猫,petl等)的人,这很好,但是如果您打算在其中处理大量数据,您可以从各种各样的陌生来源获得职业,学习像petl这样的东西是您可以做出的最好的投资之一。pip安装petl后,只需30分钟即可开始使用。该文档非常好。

答:假设您在csv文件中拥有第一个表(也可以使用petl直接从数据库中加载)。然后,您只需加载它并执行以下操作。

from petl import fromcsv, look, cut, tocsv 

#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')

Context: For this type of work you should use the amazing python petl library. That will save you a lot of work and potential frustration from doing things ‘manually’ with the standard csv module. AFAIK, the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data (pandas, petl, etc.), which is fine, but if you plan to work with a lot of data in your career from various strange sources, learning something like petl is one of the best investments you can make. To get started should only take 30 minutes after you’ve done pip install petl. The documentation is excellent.

Answer: Let’s say you have the first table in a csv file (you can also load directly from the database using petl). Then you would simply load it and do the following.

from petl import fromcsv, look, cut, tocsv 

#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')

回答 6

我认为有一个更简单的方法

import pandas as pd

dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values

因此在这里iloc[:, 0]:表示所有值,0表示列的位置。在下面的示例ID中将被选中

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

I think there is an easier way

import pandas as pd

dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values

So in here iloc[:, 0], : means all values, 0 means the position of the column. in the example below ID will be selected

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

回答 7

import pandas as pd 
csv_file = pd.read_csv("file.csv") 
column_val_list = csv_file.column_name._ndarray_values
import pandas as pd 
csv_file = pd.read_csv("file.csv") 
column_val_list = csv_file.column_name._ndarray_values

回答 8

多亏了您可以为熊猫数据帧建立索引并对其进行子集化的一种方法,一种将csv文件中的单个列提取到变量中的非常简单的方法是:

myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']

需要考虑的几件事:

上面的代码片断会产生大熊猫Series并没有dataframeusecols如果速度是一个问题,ayhan和ayhan的建议也会更快。使用以下方法测试两种不同的方法%timeit大小为2122 KB的csv文件,将产生22.8 msusecols方法和53 ms我建议的方法。

别忘了 import pandas as pd

Thanks to the way you can index and subset a pandas dataframe, a very easy way to extract a single column from a csv file into a variable is:

myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']

A few things to consider:

The snippet above will produce a pandas Series and not dataframe. The suggestion from ayhan with usecols will also be faster if speed is an issue. Testing the two different approaches using %timeit on a 2122 KB sized csv file yields 22.8 ms for the usecols approach and 53 ms for my suggested approach.

And don’t forget import pandas as pd


回答 9

如果您需要分别处理这些列,那么我想使用zip(*iterable)模式来对这些列进行解构(有效地“解压缩”)。因此,对于您的示例:

ids, names, zips, phones = zip(*(
  (row[1], row[2], row[6], row[7])
  for row in reader
))

If you need to process the columns separately, I like to destructure the columns with the zip(*iterable) pattern (effectively “unzip”). So for your example:

ids, names, zips, phones = zip(*(
  (row[1], row[2], row[6], row[7])
  for row in reader
))

回答 10

抓取列名,而不是使用readlines方法()更好地使用的ReadLine() ,以避免循环和读取的完整文件&其存储在数组中。

with open(csv_file, 'rb') as csvfile:

    # get number of columns

    line = csvfile.readline()

    first_item = line.split(',')

To fetch column name, instead of using readlines() better use readline() to avoid loop & reading the complete file & storing it in the array.

with open(csv_file, 'rb') as csvfile:

    # get number of columns

    line = csvfile.readline()

    first_item = line.split(',')