
Python 比较两个时间序列在图形上是否相似


  1. 可视化比较:将两个时间序列绘制在同一张图上,并使用相同的比例和轴标签进行比较。可以观察它们的趋势、峰值和谷值等特征,从而进行比较。
  2. 峰值和谷值比较:通过比较两个时间序列中的峰值和谷值来进行比较。可以比较它们的幅度和位置。
  3. 相关性分析:计算两个时间序列之间的相关系数,从而确定它们是否存在线性关系。如果它们的相关系数接近1,则它们趋势相似。
  4. 非线性方法:使用非线性方法来比较两个时间序列,如动态时间规整、小波变换等。这些方法可以帮助捕捉两个时间序列之间的相似性。


1. 使用Matplotlib可视化比较两个时间序列:

import matplotlib.pyplot as plt

# 生成时间序列数据
x = [1, 2, 3, 4, 5]
y1 = [10, 15, 13, 17, 20]
y2 = [8, 12, 14, 18, 22]

# 绘制两个时间序列的折线图
plt.plot(x, y1, label='y1')
plt.plot(x, y2, label='y2')

# 设置图形属性
plt.title('Comparison of two time series')

# 显示图形

2. 计算两个时间序列的相关系数:

import numpy as np

# 生成时间序列数据
x = [1, 2, 3, 4, 5]
y1 = [10, 15, 13, 17, 20]
y2 = [8, 12, 14, 18, 22]

# 计算相关系数
corr = np.corrcoef(y1, y2)[0, 1]

# 输出结果
print('Correlation coefficient:', corr)


import numpy as np

# 生成时间序列数据
x = [1, 2, 3, 4, 5]
y1 = [10, 15, 13, 17, 20]
y2 = [8, 12, 14, 18, 22]

# 动态时间规整算法
def dtw_distance(ts_a, ts_b, d=lambda x, y: abs(x - y)):
    DTW = {}

    # 初始化边界条件
    for i in range(len(ts_a)):
        DTW[(i, -1)] = float('inf')
    for i in range(len(ts_b)):
        DTW[(-1, i)] = float('inf')
    DTW[(-1, -1)] = 0

    # 计算DTW矩阵
    for i in range(len(ts_a)):
        for j in range(len(ts_b)):
            cost = d(ts_a[i], ts_b[j])
            DTW[(i, j)] = cost + min(DTW[(i-1, j)], DTW[(i, j-1)], DTW[(i-1, j-1)])

    # 返回DTW距离
    return DTW[len(ts_a)-1, len(ts_b)-1]

# 计算两个时间序列之间的DTW距离
dtw_dist = dtw_distance(y1, y2)

# 输出结果
print('DTW distance:', dtw_dist)

如果我的数据是一个Nx(20×14)的二位数组,如何转换为(N, 1, 20, 14)

如果你的数据是一个大小为(N, 20×14)的二维数组,你可以使用NumPy库中的reshape方法将其转换为一个大小为(N, 1, 20, 14)的4D张量。具体来说,你可以使用以下代码:

import numpy as np

# 假设你的数据是一个大小为(N, 20x14)的二维数组,名为data
N, H, W = data.shape[0], 20, 14

# 将数据的形状从(N, 20x14)转换为(N, 1, 20, 14)
data = data.reshape(N, 1, H, W)

# 将NumPy数组转换为PyTorch张量
data = torch.from_numpy(data)

在这里,我们首先将输入数据的高度和宽度设为20和14。然后,我们使用NumPy的reshape方法将数据从一个大小为(N, 20×14)的二维数组转换为一个大小为(N, 1, 20, 14)的4D张量。最后,我们将NumPy数组转换为PyTorch张量。









开始之前,你要确保Python和pip已经成功安装在电脑上,如果没有,请访问这篇文章:超详细Python安装指南 进行安装。

(可选1) 如果你用Python的目的是数据分析,可以直接安装Anaconda:Python数据分析与挖掘好帮手—Anaconda,它内置了Python和pip.

(可选2) 此外,推荐大家用VSCode编辑器来编写小型Python项目:Python 编程的最好搭档—VSCode 详细指南


pip install xmlrpc


import xmlrpc.client

url = 'https://yourwebsite.com/xmlrpc.php'

# WordPress登录凭证
username = 'your_username'
password = 'your_password'

# 创建XML-RPC客户端对象
wp = xmlrpc.client.ServerProxy(url)

# 创建文章的字典对象
post = {
    'post_title': 'My New Post',
    'post_content': 'This is the content of my new post.',
    'post_status': 'publish'

# 使用XML-RPC API发布文章
post_id = wp.wp.newPost('', username, password, post)

# 输出新发布文章的ID
print('New post published with ID:', post_id)

要将 FastAPI 部署到 CentOS 上,您可以按照以下步骤进行操作:

1.安装 Python 和 pip

首先,您需要在 CentOS 上安装 Python 和 pip。您可以使用以下命令安装它们:

sudo yum install python3 python3-pip

2.安装 FastAPI 和 uvicorn

然后,您需要安装 FastAPI 和 uvicorn。您可以使用以下命令安装它们:

sudo pip3 install fastapi
sudo pip3 install uvicorn


创建您的 FastAPI 应用程序。您可以在本地编写代码并将其上传到服务器,或者您可以在 CentOS 上直接创建代码文件。


使用 uvicorn 运行您的 FastAPI 应用程序。例如,如果您的应用程序文件名为 main.py,您可以使用以下命令运行它:

uvicorn main:app --host --port 8000

这将在 CentOS 上启动 FastAPI 应用程序,并将其绑定到


如果您的 CentOS 系统上启用了防火墙,则您需要添加一个防火墙规则以允许对 FastAPI 应用程序的流量。您可以使用以下命令添加规则:

sudo firewall-cmd --permanent --add-port=8000/tcp
sudo firewall-cmd --reload

这将允许来自 TCP 端口 8000 的流量通过防火墙。

现在,您已经成功地将 FastAPI 应用程序部署到 CentOS 上,并可以通过 IP 地址或域名访问它。


ChatGPT的全称是“Conversational Generative Pre-trained Transformer”,是一种基于Transformer模型的预训练语言模型,由OpenAI团队开发。它被广泛应用于自然语言处理领域,包括文本生成、对话系统、机器翻译等方面。


是的,OpenAI公开了多个版本的GPT模型的预训练权重,包括GPT-1、GPT-2和GPT-3。这些预训练权重可以通过OpenAI的API或者Hugging Face等第三方库进行获取和使用。




  1. Hugging Face的模型库:Hugging Face是一个NLP模型和工具的社区,提供了大量的预训练模型和工具,包括GPT系列模型的预训练权重。你可以在https://huggingface.co/models查看可用的GPT模型,并下载对应的权重。
  2. OpenAI的模型库:OpenAI是GPT模型的开发者之一,他们提供了多个版本的GPT预训练权重,包括GPT、GPT-2和GPT-3等。你可以在https://beta.openai.com/models/gpt查看并下载预训练权重。
  3. TensorFlow官方模型库:如果你使用TensorFlow框架,可以从TensorFlow官方模型库中下载GPT模型的预训练权重。你可以在https://tfhub.dev/s?q=gpt查找可用的GPT模型,并下载对应的权重。




gpt2 模型:https://github.com/openai/gpt-2


  1. 安装GPT-2模型:在使用GPT-2之前,您需要下载和安装模型。您可以从OpenAI官方网站下载模型,或使用一些已经封装好的Python库来获取模型。
  2. 准备数据:在使用GPT-2生成文本之前,您需要准备一个语料库。这个语料库可以是您自己的文本数据集,也可以是从互联网上抓取的数据集。
  3. 运行GPT-2:一旦您准备好了数据和模型,您就可以开始运行GPT-2了。根据您的任务,您可能需要微调模型或使用不同的超参数来优化生成的文本。
  4. 生成文本:一旦您的模型已经训练好了,您可以使用它来生成文本。您可以通过调用模型的API或使用一些现成的工具来生成文本。





  1. 任务类型:GPT-2是一个生成式模型,它可以生成连续的文本序列,比如文章、故事或对话。而BERT是一个判别式模型,它可以对输入的文本进行分类、回归等任务。
  2. 输入方式:GPT-2的输入是一个上下文序列,模型会根据这个序列生成一个下一个单词或词组,可以被用于语言模型、文本生成等任务。BERT的输入则是一个完整的文本,模型会输出这个文本的某种特征表示,可以被用于文本分类、情感分析等任务。
  3. 训练数据:GPT-2使用的训练数据是从互联网上采集的大量文本数据,而BERT使用的是一些特定的任务数据集,例如阅读理解、问答等任务。
  4. 架构:GPT-2采用的是自回归架构(Autoregressive Architecture),即模型会根据之前的输入生成下一个单词,一步步生成整个文本序列。BERT采用的是编码器-解码器架构(Encoder-Decoder Architecture),即模型会将输入编码成一个表示,然后解码为输出。
  5. 预训练目标:GPT-2的预训练目标是使用未来的单词来预测当前的单词,这被称为掩码语言建模(Masked Language Modeling)。BERT的预训练目标则包括两种任务:掩码语言建模和下一句预测(Next Sentence Prediction)。


Python SciPy是否需要BLAS?

    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.



Which tar do I need to download off this site?

I’ve tried the fortrans, but I keep getting this error (after setting the environment variable obviously).

回答 0


mkdir -p ~/src/
cd ~/src/
wget http://www.netlib.org/blas/blas.tgz
tar xzf blas.tgz
cd BLAS-*

## NOTE: The selected Fortran compiler must be consistent for BLAS, LAPACK, NumPy, and SciPy.
## For GNU compiler on 32-bit systems:
#g77 -O2 -fno-second-underscore -c *.f                     # with g77
#gfortran -O2 -std=legacy -fno-second-underscore -c *.f    # with gfortran
## OR for GNU compiler on 64-bit systems:
#g77 -O3 -m64 -fno-second-underscore -fPIC -c *.f                     # with g77
gfortran -O3 -std=legacy -m64 -fno-second-underscore -fPIC -c *.f    # with gfortran
## OR for Intel compiler:
#ifort -FI -w90 -w95 -cm -O3 -unroll -c *.f

# Continue below irrespective of compiler:
ar r libfblas.a *.o
ranlib libfblas.a
rm -rf *.o
export BLAS=~/src/BLAS-*/libfblas.a

仅执行五个g77 / gfortran / ifort命令之一。我已注释掉所有内容,但我使用的是gfortran。随后的LAPACK安装需要一个Fortran 90编译器,并且由于两个安装都应使用相同的Fortran编译器,因此g77不应用于BLAS。


mkdir -p ~/src
cd ~/src/
wget http://www.netlib.org/lapack/lapack.tgz
tar xzf lapack.tgz
cd lapack-*/
cp INSTALL/make.inc.gfortran make.inc          # On Linux with lapack-3.2.1 or newer
make lapacklib
make clean
export LAPACK=~/src/lapack-*/liblapack.a

2015年9月3日更新:今天验证了一些评论(感谢所有):运行之前,make lapacklib编辑make.inc文件-fPIC并向OPTSNOOPT设置添加选项。如果您使用的是64位体系结构或要编译为64位体系结构,请同时添加-m64。重要的是,在将这些选项设置为相同值的情况下编译BLAS和LAPACK。如果您忘记了,-fPICSciPy实际上会给您有关符号丢失的错误,并建议您使用此开关。make.inc我的设置中的特定部分如下所示:

FORTRAN  = gfortran 
OPTS     = -O2 -frecursive -fPIC -m64
NOOPT    = -O0 -frecursive -fPIC -m64
LOADER   = gfortran

在旧机器(例如RedHat 5)上,gfortran可能安装在旧版本(例如4.1.2)中,并且不理解option -frecursivemake.inc在这种情况下,只需将其从文件中删除即可。


Execute only one of the five g77/gfortran/ifort commands. I have commented out all, but the gfortran which I use. The subsequent LAPACK installation requires a Fortran 90 compiler, and since both installs should use the same Fortran compiler, g77 should not be used for BLAS.

The lapack test target of the Makefile fails in my setup because it cannot find the blas libraries. If you are thorough you can temporarily move the blas library to the specified location to test the lapack. I’m a lazy person, so I trust the devs to have it working and verify only in SciPy.

回答 1



sudo apt-get install gfortran libopenblas-dev liblapack-dev

然后安装SciPy(在下载SciPy源代码之后):python setup.py install

pip install scipy


If you need to use the latest versions of SciPy rather than the packaged version, without going through the hassle of building BLAS and LAPACK, you can follow the below procedure.

Install linear algebra libraries from repository (for Ubuntu),

sudo apt-get install gfortran libopenblas-dev liblapack-dev

Then install SciPy, (after downloading the SciPy source): python setup.py install or

pip install scipy

As the case may be.

回答 2


 yum install lapack lapack-devel blas blas-devel
 pip install numpy
 pip install scipy

请记住除了安装“ blas ”和“ lapack ”之外,还要安装“ lapack-devel ”和“ blas-devel ”,否则,您将得到所提到的错误或“ numpy.distutils.system_info。LapackNotFoundError ”错误。

On Fedora, this works:

 yum install lapack lapack-devel blas blas-devel
 pip install numpy
 pip install scipy

Remember to install ‘lapack-devel‘ and ‘blas-devel‘ in addition to ‘blas’ and ‘lapack’ otherwise you’ll get the error you mentioned or the “numpy.distutils.system_info.LapackNotFoundError” error.

回答 3


apt-get install python-numpy python-scipy


I guess you are talking about installation in Ubuntu. Just use:

apt-get install python-numpy python-scipy

That should take care of the BLAS libraries compiling as well. Else, compiling the BLAS libraries is very difficult.

回答 4

对于Windows用户,Chris提供了一个不错的二进制程序包(警告:下载量很大,为191 MB):

For Windows users there is a nice binary package by Chris (warning: it’s a pretty large download, 191 MB):

回答 5

遵循“ cfi”给出的说明对我有用,尽管它们遗漏了一些您可能需要的部分:


cd ~/src
mv lapack-[tab] LAPACK


cd ~/src/LAPACK 
cp lapack_LINUX.a libflapack.a

回答 6


sudo apt-get install python3-scipy

Try using

sudo apt-get install python3-scipy



我有一个多线程Python程序和一个实用程序函数, writeLog(message),该写出时间戳记和消息。不幸的是,结果日志文件没有给出哪个线程正在生成哪个消息的指示。


I have a multi-threading Python program, and a utility function, writeLog(message), that writes out a timestamp followed by the message. Unfortunately, the resultant log file gives no indication of which thread is generating which message.

I would like writeLog() to be able to add something to the message to identify which thread is calling it. Obviously I could just make the threads pass this information in, but that would be a lot more work. Is there some thread equivalent of os.getpid() that I could use?

回答 0

threading.get_ident(),或threading.current_thread().ident(或(threading.currentThread().ident对于python <2.6)。

threading.get_ident() works, or threading.current_thread().ident (or threading.currentThread().ident for Python < 2.6).

回答 1


%(thread)d: 线程ID(如果有)。

%(threadName)s: 线程名称(如果有)。



Using the logging module you can automatically add the current thread identifier in each log entry. Just use one of these LogRecord mapping keys in your logger format string:

%(thread)d : Thread ID (if available).

%(threadName)s : Thread name (if available).

and set up your default handler with it:


回答 2



import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

回答 3

回答 4


class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID



回答 5




import threading

def worker():

threading.Thread(target=worker, name='foo').start()


回答 6

我在Python中创建了多个线程,打印了线程对象,并使用ident变量打印了id 。我看到所有ID都一样:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

回答 7

与@brucexin类似,我需要获取操作系统级别的线程标识符(!= thread.get_ident()),并使用如下所示的内容来不依赖于特定的数字并且仅使用amd64:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)                                                                  

---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos


print 'my tid: %d' % xos.gettid()


this depends on Cython though.





ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |




import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content


回答 0



for row in reader:
    content = list(row[i] for i in included_cols)
print content


for row in reader:
        content = list(row[i] for i in included_cols)
        print content



import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']


names = df.Names


The only way you would be getting the last column from this code is if you don’t include your print statement in your for loop.

This is most likely the end of your code:

for row in reader:
    content = list(row[i] for i in included_cols)
print content

You want it to be this:

for row in reader:
        content = list(row[i] for i in included_cols)
        print content

Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.

Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:

names = df.Names

It’s a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn’t happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!

回答 1

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('file.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k



Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.


['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']


with open('file.txt') as f:
    reader = csv.reader(f)
    for row in reader:
        for (i,v) in enumerate(row):

['Bob', 'James', 'Smithers']

要更改分隔符,请添加delimiter=" "适当的实例,即reader = csv.reader(f,delimiter=" ")

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('file.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k


回答 2


import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']


my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])


回答 3


df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])


import pandas as pd
import io

s = '''

df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])

   total_bill  day  size
0       16.99  Sun     2
1       10.34  Sun     3
2       21.01  Sun     3

回答 4


ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |


import numpy as np 

>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 


b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 

回答 5

上下文:对于这类工作,您应该使用令人惊叹的python petl库。通过标准的csv模块“手动”执行操作,可以节省大量工作和潜在的挫败感。AFAIK,唯一仍在使用csv模块的人是尚未发现更好的工具来处理表格数据(熊猫,petl等)的人,这很好,但是如果您打算在其中处理大量数据,您可以从各种各样的陌生来源获得职业,学习像petl这样的东西是您可以做出的最好的投资之一。pip安装petl后,只需30分钟即可开始使用。该文档非常好。


from petl import fromcsv, look, cut, tocsv 

#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')

回答 6


import pandas as pd

dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values

因此在这里iloc[:, 0]:表示所有值,0表示列的位置。在下面的示例ID中将被选中

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

I think there is an easier way

回答 7

import pandas as pd 
csv_file = pd.read_csv("file.csv") 
column_val_list = csv_file.column_name._ndarray_values
import pandas as pd 
csv_file = pd.read_csv("file.csv") 
column_val_list = csv_file.column_name._ndarray_values

回答 8


myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']


上面的代码片断会产生大熊猫Series并没有dataframeusecols如果速度是一个问题,ayhan和ayhan的建议也会更快。使用以下方法测试两种不同的方法%timeit大小为2122 KB的csv文件,将产生22.8 msusecols方法和53 ms我建议的方法。

别忘了 import pandas as pd

回答 9


ids, names, zips, phones = zip(*(
  (row[1], row[2], row[6], row[7])
  for row in reader

回答 10

抓取列名,而不是使用readlines方法()更好地使用的ReadLine() ,以避免循环和读取的完整文件&其存储在数组中。

with open(csv_file, 'rb') as csvfile:

    # get number of columns

    line = csvfile.readline()

    first_item = line.split(',')

