大型Python项目依赖树很烦?教你一招一键理清

你可能时常会遇到由于包的版本不匹配导致代码报错的问题,由于 pip freeze 将所有依赖项显示为二维列表,这时候如果想找到这个错误版本的包是比较麻烦的事情。这时候,有个工具你必须得知道,它就是 pipdeptree.

pipdeptree 是一个命令行实用程序,它能用于以依赖关系树可视化的形式显示已安装的python包。

它适用于全局安装在计算机上的各个模块软件包,也适用于Virtualenv等虚拟环境中的软件包。

1.安装

你只需要在你的虚拟环境中输入以下命令就能安装pipdeptree:

pip install pipdeptree

已通过测试的Python版本:2.7,3.5,3.6,3.7,3.8,3.9 以及 pypy2 和 pypy3.

2.用法和示例

pip freeze 和 pipdeptree 最大的区别如下:

# pip freeze 的显示
$ pip freeze
Flask==0.10.1
itsdangerous==0.24
Jinja2==2.11.2
-e git+git@github.com:naiquevin/lookupy.git@cdbe30c160e1c29802df75e145ea4ad903c05386#egg=Lookupy
MarkupSafe==0.22
pipdeptree @ file:///private/tmp/pipdeptree-2.0.0b1-py3-none-any.whl
Werkzeug==0.11.2

可见,pip freeze 最多只能显示一个依赖的列表,而在 pipdeptree ,每个模块的依赖关系能够非常直观地展示出来:

$ pipdeptree
Warning!!! Possibly conflicting dependencies found:
* Jinja2==2.11.2
 - MarkupSafe [required: >=0.23, installed: 0.22]
------------------------------------------------------------------------
Flask==0.10.1
  - itsdangerous [required: >=0.21, installed: 0.24]
  - Jinja2 [required: >=2.4, installed: 2.11.2]
    - MarkupSafe [required: >=0.23, installed: 0.22]
  - Werkzeug [required: >=0.7, installed: 0.11.2]
Lookupy==0.1
pipdeptree==2.0.0b1
  - pip [required: >=6.0.0, installed: 20.1.1]
setuptools==47.1.1
wheel==0.34.2

请注意这个 Warning,提示了你哪些模块会造成其依赖的模块版本发生冲突,这是非常有用的提示,很多时候问题就出现在这里。

不仅如此,如果存在循环性依赖,比如:

CircularDependencyA => CircularDependencyB => CircularDependencyA

它会进行如下提示:

$ pipdeptree --exclude pip,pipdeptree,setuptools,wheel
Warning!!! Cyclic dependencies found:
- CircularDependencyA => CircularDependencyB => CircularDependencyA
- CircularDependencyB => CircularDependencyA => CircularDependencyB
------------------------------------------------------------------------
wsgiref==0.1.2
argparse==1.2.1

如果你想生成 requirements.txt,可以这么做:

$ pipdeptree -f | tee locked-requirements.txt
Flask==0.10.1
  itsdangerous==0.24
  Jinja2==2.11.2
    MarkupSafe==0.23
  Werkzeug==0.11.2
gnureadline==8.0.0
-e git+git@github.com:naiquevin/lookupy.git@cdbe30c160e1c29802df75e145ea4ad903c05386#egg=Lookupy
pipdeptree @ file:///private/tmp/pipdeptree-2.0.0b1-py3-none-any.whl
  pip==20.1.1
setuptools==47.1.1
wheel==0.34.2

在确认没有冲突的依赖项后,甚至可以将其“锁定”,其中所有包都将固定到其当前安装的版本:

$ pipdeptree -f | sed 's/ //g' | sort -u > locked-requirements.txt

3. 可视化依赖树

为了能够可视化展示依赖树,我们需要安装GraphViz,我们需要安装GraphViz,安装GraphViz​的教程可见这篇文章:Python 一键转化代码为流程图。​安装完成后输入以下命令:

pipdeptree --graph-output png > dependencies.png

# pipdeptree --graph-output dot > dependencies.dot
# pipdeptree --graph-output pdf > dependencies.pdf
# pipdeptree --graph-output svg > dependencies.svg

支持四种格式的输出,这里png的输出效果如下:

效果还是非常不错的,大家如果有需要清理依赖的大型项目,可以用 pipdeptree 试一下。

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!


​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

超级方便的轻量级Python流水线工具,拥有漂亮的可视化界面!

Mara-pipelines 是一个轻量级的数据转换框架,具有透明和低复杂性的特点。其他特点如下:

  • 基于非常简单的Python代码就能完成流水线开发。
  • 使用 PostgreSQL 作为数据处理引擎。
  • 有Web界面可视化分析流水线执行过程。
  • 基于 Python 的 multiprocessing 单机流水线执行。不需要分布式任务队列。轻松调试和输出日志。
  • 基于成本的优先队列:首先运行具有较高成本(基于记录的运行时间)的节点。

此外,在Mara-pipelines的Web界面中,你不仅可以查看和管理流水线及其任务节点,你还可以直接触发这些流水线和节点,非常好用:

1.安装

由于使用了大量的依赖,Mara-pipelines 并不适用于Windows,如果你需要在Windows上使用Mara-pipelines,请使用docker或者windows下的linux子系统

使用pipe安装Mara-pipelines:

pip install mara-pipelines

或者:

pip install git+https://github.com/mara/mara-pipelines.git

2.使用示例

这是一个基础的流水线演示,由三个相互依赖的节点组成,包括 任务1(ping_localhost), 子流水线(sub_pipeline), 任务2(sleep):

# 注意,这个示例中使用了部分国外的网站,如果无法访问,请变更为国内网站。
from mara_pipelines.commands.bash import RunBash
from mara_pipelines.pipelines import Pipeline, Task
from mara_pipelines.ui.cli import run_pipeline, run_interactively

pipeline = Pipeline(
    id='demo',
    description='A small pipeline that demonstrates the interplay between pipelines, tasks and commands')

pipeline.add(Task(id='ping_localhost', description='Pings localhost',
                  commands=[RunBash('ping -c 3 localhost')]))

sub_pipeline = Pipeline(id='sub_pipeline', description='Pings a number of hosts')

for host in ['google', 'amazon', 'facebook']:
    sub_pipeline.add(Task(id=f'ping_{host}', description=f'Pings {host}',
                          commands=[RunBash(f'ping -c 3 {host}.com')]))

sub_pipeline.add_dependency('ping_amazon', 'ping_facebook')
sub_pipeline.add(Task(id='ping_foo', description='Pings foo',
                      commands=[RunBash('ping foo')]), ['ping_amazon'])

pipeline.add(sub_pipeline, ['ping_localhost'])

pipeline.add(Task(id='sleep', description='Sleeps for 2 seconds',
                  commands=[RunBash('sleep 2')]), ['sub_pipeline'])

可以看到,Task包含了多个commands,这些commands会用于真正地执行动作。而 pipeline.add 的参数中,第一个参数是其节点,第二个参数是此节点的上游。如:

pipeline.add(sub_pipeline, ['ping_localhost'])

则表明必须执行完 ping_localhost 才会执行 sub_pipeline.

为了运行这个流水线,需要配置一个 PostgreSQL 数据库来存储运行时信息、运行输出和增量处理状态:

import mara_db.auto_migration
import mara_db.config
import mara_db.dbs

mara_db.config.databases \
    = lambda: {'mara': mara_db.dbs.PostgreSQLDB(host='localhost', user='root', database='example_etl_mara')}

mara_db.auto_migration.auto_discover_models_and_migrate()

如果 PostgresSQL 正在运行并且账号密码正确,输出如下所示(创建了一个包含多个表的数据库):

Created database "postgresql+psycopg2://root@localhost/example_etl_mara"

CREATE TABLE data_integration_file_dependency (
    node_path TEXT[] NOT NULL, 
    dependency_type VARCHAR NOT NULL, 
    hash VARCHAR, 
    timestamp TIMESTAMP WITHOUT TIME ZONE, 
    PRIMARY KEY (node_path, dependency_type)
);

.. more tables

为了运行这个流水线,你需要:

from mara_pipelines.ui.cli import run_pipeline

run_pipeline(pipeline)

这将运行单个流水线节点及其 (sub_pipeline) 所依赖的所有节点:

run_pipeline(sub_pipeline, nodes=[sub_pipeline.nodes['ping_amazon']], with_upstreams=True)

3.Web 界面

我认为 mara-pipelines 最有用的是他们提供了基于Flask管控流水线的Web界面。

对于每条流水线,他们都有一个页面显示:

  • 所有子节点的图以及它们之间的依赖关系
  • 流水线的总体运行时间图表以及过去 30 天内最昂贵的节点(可配置)
  • 所有流水线节点及其平均运行时间和由此产生的排队优先级的表
  • 流水线最后一次运行的输出和时间线

对于每个任务,都有一个页面显示

  • 流水线中任务的上游和下游
  • 最近 30 天内任务的运行时间
  • 任务的所有命令
  • 任务最后运行的输出

此外,流水线和任务可以直接从网页端调用运行,这是非常棒的特点:

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!


​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

Vulture 一键找出项目中所有无效的Python代码

Vulture 可以在Python程序中查找未使用的代码。这对于清理和查找大型项目(代码库)中的错误非常有用。

不过由于Python的动态特性,像 Vulture 这样的静态代码分析器很可能会遗漏一些无效代码,此外,可能会将仅被隐式调用的代码标记为未使用。

尽管如此,Vulture对于提升代码质量来说可能是一个非常有用的工具。

1.功能

  • FAST:静态代码分析
  • 已测试
  • 与pyflies相辅相成,具有相同的输出语法
  • 可以按大小对未使用的类和函数进行排序 --sort-by-size
  • 支持Python>=3.6

2.准备

开始之前,你要确保Python和pip已经成功安装在电脑上,如果没有,请访问这篇文章:超详细Python安装指南 进行安装。

(可选1) 如果你用Python的目的是数据分析,可以直接安装Anaconda:Python数据分析与挖掘好帮手—Anaconda,它内置了Python和pip.

(可选2) 此外,推荐大家用VSCode编辑器来编写小型Python项目:Python 编程的最好搭档—VSCode 详细指南

Windows环境下打开Cmd(开始—运行—CMD),苹果系统环境下请打开Terminal(command+空格输入Terminal),输入命令安装依赖:

pip install vulture

3.用法

你可以直接使用命令行工具运行 vulture:

vulture myscript.py  # 或者
python3 -m vulture myscript.py  # 或者
vulture myscript.py mypackage/  # 或者
vulture myscript.py --min-confidence 100  # 只报告100%可能的无效代码

如果 vulture 没有被加进环境变量(如Windows系统下不会自动加到环境变量中),建议使用 python -m 的方式调用 vulture。

可见,命令的参数可以是 Python 文件或目录。对于每个目录,Vulture 会分析所有包含的 *.py文件。

Vulture 为每个无效代码块分配一个置信度值。100% 的置信度值意味着百分百的无效代码。

找到并删除无效代码后,再次运行 Vulture,因为它可能会发现更多的无效代码。

下面举个例子,参考下述代码:

import os

class Greeter:
    def greet(self):
        print("Hi")

def hello_world():
    message = "Hello, world!"
    greeter = Greeter()
    greet_func = getattr(greeter, "greet")
    greet_func()

if __name__ == "__main__":
    hello_world()

调用vulture:

vulture dead_code.py
# 或者
python -m vulture dead_code.py

输出效果如下:

dead_code.py:1: unused import 'os' (90% confidence)
dead_code.py:4: unused function 'greet' (60% confidence)
dead_code.py:8: unused variable 'message' (60% confidence)

Vulture 正确地将“os”和“message”报告为未使用,但未能检测到实际使用了“greet”。处理此类误报的推荐方法是创建一个白名单 Python 文件。见下面第四点。

4.处理误报

当 Vulture 错误地将代码块报告为未使用时,有多种选择来抑制误报。如果修复误报也可以使其他用户受益,请提交问题报告。

白名单

推荐的选项是将报告为”未使用的”已使用代码添加到 Python 模块,并将其添加到扫描路径列表中。要自动获取这样的白名单,请传递 --make-whitelist 给 Vulture:

vulture mydir --make-whitelist > whitelist.py
vulture mydir whitelist.py

请注意,生成的 whitelist.py 文件将包含有效的 Python 语法,但为了让 Python 能够运行它,通常需要进行一些修改。

忽略文件

如果要忽略整个文件或目录,请使用--exclude 参数(例如,--exclude *settings.py,docs/)。

Flake8 noqa 注释

为了与flake8兼容,Vulture 支持F401 和 F841错误代码以忽略未使用的导入 ( # noqa: F401) 和未使用的局部变量 ( # noqa: F841)。但是,我们建议使用白名单而不是noqa注释,因为noqa注释会给代码增加视觉干扰并使其更难阅读。

忽略名称

你还可以使用 --ignore-names foo*,ba[rz] 让 Vulture 忽略所有以 foo 开头的及 barbaz 的名称。此外,该 --ignore-decorators 选项可用于忽略用给定装饰器装饰的函数。这在 Flask 项目中很有帮助,您可以在其中使用装饰器 --ignore-decorators "@app.route" 忽略所有 @app.route 函数。

我们建议使用白名单代替 --ignore-names 或 --ignore-decorators ,因为白名单在传递给 Vulture 时会自动检查语法正确性。

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!


​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

Addict 写起来令人极其舒适的字典模块

Addit 是一个Python模块,除了提供标准的字典语法外,Addit生成的字典的值既可以使用属性来获取,也可以使用属性进行设置。

这意味着你不用再写这样的字典了:

 

body = {
    'query': {
        'filtered': {
            'query': {
                'match': {'description': 'addictive'}
            },
            'filter': {
                'term': {'created_by': 'Mats'}
            }
        }
    }
}

相反,你只需编写以下三行就能完成目的:

body = Dict()
body.query.filtered.query.match.description = 'addictive'
body.query.filtered.filter.term.created_by = 'Mats'

1.安装

你可以通过安装pip

pip install addict

或通过conda

conda install addict -c conda-forge

Addit 在Python2.7+和Python3上都可以运行。

2.用法

Addict 继承自dict,但在访问和设置其值方面更加灵活。使用字典现在是一种乐趣!

设置嵌套词典的项是极其舒服的:

>>> from addict import Dict
>>> mapping = Dict()
>>> mapping.a.b.c.d.e = 2
>>> mapping
{'a': {'b': {'c': {'d': {'e': 2}}}}}

如果Dict是用任何可迭代值实例化的,它将遍历并克隆这些值,然后写入到对应的属性及值中,比如:

>>> mapping = {'a': [{'b': 3}, {'b': 3}]}
>>> dictionary = Dict(mapping)
>>> dictionary.a[0].b
3

mapping['a']不再与dictionary['a']相同。

>>> mapping['a'] is dictionary['a']
False

当然,此特点仅限于构造函数,而不是在使用属性或设置值时:

>>> a = Dict()
>>> b = [1, 2, 3]
>>> a.b = b
>>> a.b is b
True

3.要牢记的事情

记住,int不是有效的属性名,因此必须使用 get/setitem 语法 设置/获取 非字符串的dict键:

>>> addicted = Dict()
>>> addicted.a.b.c.d.e = 2
>>> addicted[2] = [1, 2, 3]
{2: [1, 2, 3], 'a': {'b': {'c': {'d': {'e': 2}}}}}

不过,你可以随意混合使用这两种语法:

>>> addicted.a.b['c'].d.e
2

4.属性,如键、item等

Addit 不会让你覆盖dict的属性,因此以下操作将不起作用

>>> mapping = Dict()
>>> mapping.keys = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "addict/addict.py", line 53, in __setattr__
raise AttributeError("'Dict' object attribute '%s' is read-only" % name)
AttributeError: 'Dict' object attribute 'keys' is read-only

不过,使用下面这种方式就可以:

>>> a = Dict()
>>> a['keys'] = 2
>>> a
{'keys': 2}
>>> a['keys']
2

5.默认值

对于不在字典中的键,Addit的行为如defaultdict(Dict),因此丢失的键返回一个空的Dict而不是抛出KeyError如果此行为不是所需的,则可以使用以下方式恢复抛出KeyError:

>>> class DictNoDefault(Dict):
>>> def __missing__(self, key):
>>> raise KeyError(key)

但请注意,这样会失去速记赋值功能(addicted.a.b.c.d.e = 2)

6.转化为普通字典

如果你觉得将 Addict 传递到其他函数或模块并不安全,请使用to_dict()方法,它返回会把 Addict 转化为普通字典。

>>> regular_dict = my_addict.to_dict()
>>> regular_dict.a = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'a'

当您希望在几行代码中创建嵌套的字典,然后将其发送到不同的函数或模块时,这非常适合:

body = Dict()
body.query.filtered.query.match.description = 'addictive'
body.query.filtered.filter.term.created_by = 'Mats'
third_party_module.search(query=body.to_dict())

7.计数

Dict轻松访问和修改深度嵌套属性的能力使其成为计数的理想选择。使用Addict,你还可以容易允许按多个级别计数,内部使用的原理是collections.Counter

比如以下数据:

data = [
    {'born': 1980, 'gender': 'M', 'eyes': 'green'},
    {'born': 1980, 'gender': 'F', 'eyes': 'green'},
    {'born': 1980, 'gender': 'M', 'eyes': 'blue'},
    {'born': 1980, 'gender': 'M', 'eyes': 'green'},
    {'born': 1980, 'gender': 'M', 'eyes': 'green'},
    {'born': 1980, 'gender': 'F', 'eyes': 'blue'},
    {'born': 1981, 'gender': 'M', 'eyes': 'blue'},
    {'born': 1981, 'gender': 'F', 'eyes': 'green'},
    {'born': 1981, 'gender': 'M', 'eyes': 'blue'},
    {'born': 1981, 'gender': 'F', 'eyes': 'blue'},
    {'born': 1981, 'gender': 'M', 'eyes': 'green'},
    {'born': 1981, 'gender': 'F', 'eyes': 'blue'}
]

如果你想计算有多少人出生在born性别的gender使用eyes眼睛,你可以很容易地计算出这些信息:

counter = Dict()

for row in data:
born = row['born']
gender = row['gender']
eyes = row['eyes']
counter
[born][gender][eyes] += 1 print(counter)

{1980: {'M': {'blue': 1, 'green': 3}, 'F': {'blue': 1, 'green': 1}}, 1981: {'M': {'blue': 2, 'green': 1}, 'F': {'blue': 2, 'green': 1}}}

8.更新

普通字典的更新方式如下:

>>> d = {'a': {'b': 3}}
>>> d.update({'a': {'c': 4}})
>>> print(d)
{'a': {'c': 4}}

addict的更新方式如下,它会递归并实际更新嵌套的字典:

>>> D = Dict({'a': {'b': 3}})
>>> D.update({'a': {'c': 4}})
>>> print(D)
{'a': {'b': 3, 'c': 4}}

9.为什么需要addict

这个模块完全是从用Python创建Elasticsearch查询的繁琐过程中发展而来的。每当你发现自己在写了很复杂的字典逻辑时,只要记住你没有必要这样做,使用 Addict 就行。

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!


​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

Tqsdk-python-天勤量化开发包,期货量化,实时行情/历史数据/实盘交易

TqSdk天勤量化交易策略程序开发包

TqSdk是一个由信易科技发起并贡献主要代码的开源Python库.依托快期多年积累成熟的交易及行情服务器体系,TqSdk支持用户使用极少的代码量构建各种类型的量化交易策略程序,并提供包含期货、期权、股票的历史数据-实时数据-开发调试-策略回测-模拟交易-实盘交易-运行监控-风险管理全套解决方案

from tqsdk import TqApi, TqAuth, TqAccount, TargetPosTask

api = TqApi(TqAccount("H海通期货", "4003242", "123456"), auth=TqAuth("信易账户", "账户密码"))      # 创建 TqApi 实例, 指定交易账户
q_1910 = api.get_quote("SHFE.rb1910")                         # 订阅近月合约行情
t_1910 = TargetPosTask(api, "SHFE.rb1910")                    # 创建近月合约调仓工具
q_2001 = api.get_quote("SHFE.rb2001")                         # 订阅远月合约行情
t_2001 = TargetPosTask(api, "SHFE.rb2001")                    # 创建远月合约调仓工具

while True:
  api.wait_update()                                           # 等待数据更新
  spread = q_1910["last_price"] - q_2001["last_price"]        # 计算近月合约-远月合约价差
  print("当前价差:", spread)
  if spread > 250:
    print("价差过高: 空近月,多远月")                            
    t_1910.set_target_volume(-1)                              # 要求把1910合约调整为空头1手
    t_2001.set_target_volume(1)                               # 要求把2001合约调整为多头1手
  elif spread < 200:
    print("价差回复: 清空持仓")                               # 要求把 1910 和 2001合约都调整为不持仓
    t_1910.set_target_volume(0)
    t_2001.set_target_volume(0)

要快速了解如何使用TqSdk,可以访问我们的十分钟快速入门指南

架构

功能

TqSdk提供的功能可以支持从简单到复杂的各类策略程序

  • 公司级数据运维,提供当前所有可交易合约从上市开始的全部Tick数据和K线数据
  • 支持市场上90%的期货公司实盘交易
  • 支持模拟交易
  • 支持Tick级和K线级回测,支持复杂策略回测
  • 提供近百个技术指标函数及源码
  • 用户无须建立和维护数据库,行情和交易数据全在内存数据库,无访问延迟
  • 优化支持熊猫麻木的
  • 无强制框架结构,支持任意复杂度的策略,在一个交易策略程序中使用多个品种的K线/实时行情并交易多个品种
  • 配合开发者支持工具,能够进行交易信号打点,支持自定义指标画图

安装

TqSdk仅支持Python3.6及更高版本。要安装TqSdk,可使用PIP:

$ pip install tqsdk

文档

在线阅读HTML版本文档:https://doc.shinnytech.com/tqsdk/latest

在线问答社区:https://www.shinnytech.com/qa

知乎账户[天勤量化]:https://www.zhihu.com/org/tian-qin-liang-hua/activities

用户交流QQ群:619870862(目前只允许给我们点过STAR的同学加入,加群时请提供GitHub用户名)

GUI

TqSdk本身自带的WEB_GUI功能,简单一行参数即可支持调用图形化界面,详情参考web_gui

关于我们

信易科技是专业的期货软件供应商和交易所授权行情服务商.旗下的快期系列产品已为市场服务超过10年。TqSdk是公司开源计划的一部分

Python SciPy是否需要BLAS?

问题:Python SciPy是否需要BLAS?

numpy.distutils.system_info.BlasNotFoundError: 
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.

我需要从该站点下载哪个tar?

我已经尝试过fortrans,但是一直出现此错误(明显地设置了环境变量之后)。

numpy.distutils.system_info.BlasNotFoundError: 
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.

Which tar do I need to download off this site?

I’ve tried the fortrans, but I keep getting this error (after setting the environment variable obviously).


回答 0

SciPy的网页用来提供构建和安装说明,但说明现在依靠操作系统二进制分发。要在没有预编译所需库软件包的操作系统上构建SciPy(和NumPy),必须先构建然后静态链接到Fortran库BLASLAPACK

mkdir -p ~/src/
cd ~/src/
wget http://www.netlib.org/blas/blas.tgz
tar xzf blas.tgz
cd BLAS-*

## NOTE: The selected Fortran compiler must be consistent for BLAS, LAPACK, NumPy, and SciPy.
## For GNU compiler on 32-bit systems:
#g77 -O2 -fno-second-underscore -c *.f                     # with g77
#gfortran -O2 -std=legacy -fno-second-underscore -c *.f    # with gfortran
## OR for GNU compiler on 64-bit systems:
#g77 -O3 -m64 -fno-second-underscore -fPIC -c *.f                     # with g77
gfortran -O3 -std=legacy -m64 -fno-second-underscore -fPIC -c *.f    # with gfortran
## OR for Intel compiler:
#ifort -FI -w90 -w95 -cm -O3 -unroll -c *.f

# Continue below irrespective of compiler:
ar r libfblas.a *.o
ranlib libfblas.a
rm -rf *.o
export BLAS=~/src/BLAS-*/libfblas.a

仅执行五个g77 / gfortran / ifort命令之一。我已注释掉所有内容,但我使用的是gfortran。随后的LAPACK安装需要一个Fortran 90编译器,并且由于两个安装都应使用相同的Fortran编译器,因此g77不应用于BLAS。

接下来,您需要安装LAPACK东西。SciPy网页的说明在这里也对我有所帮助,但我必须对其进行修改以适合我的环境:

mkdir -p ~/src
cd ~/src/
wget http://www.netlib.org/lapack/lapack.tgz
tar xzf lapack.tgz
cd lapack-*/
cp INSTALL/make.inc.gfortran make.inc          # On Linux with lapack-3.2.1 or newer
make lapacklib
make clean
export LAPACK=~/src/lapack-*/liblapack.a

2015年9月3日更新:今天验证了一些评论(感谢所有):运行之前,make lapacklib编辑make.inc文件-fPIC并向OPTSNOOPT设置添加选项。如果您使用的是64位体系结构或要编译为64位体系结构,请同时添加-m64。重要的是,在将这些选项设置为相同值的情况下编译BLAS和LAPACK。如果您忘记了,-fPICSciPy实际上会给您有关符号丢失的错误,并建议您使用此开关。make.inc我的设置中的特定部分如下所示:

FORTRAN  = gfortran 
OPTS     = -O2 -frecursive -fPIC -m64
DRVOPTS  = $(OPTS)
NOOPT    = -O0 -frecursive -fPIC -m64
LOADER   = gfortran

在旧机器(例如RedHat 5)上,gfortran可能安装在旧版本(例如4.1.2)中,并且不理解option -frecursivemake.inc在这种情况下,只需将其从文件中删除即可。

Makefile的lapack测试目标在我的设置中失败,因为它找不到blas库。如果您周全,则可以将blas库临时移至指定位置以测试lapack。我是一个懒惰的人,所以我相信开发人员可以使其工作并仅在SciPy中进行验证。

The SciPy webpage used to provide build and installation instructions, but the instructions there now rely on OS binary distributions. To build SciPy (and NumPy) on operating systems without precompiled packages of the required libraries, you must build and then statically link to the Fortran libraries BLAS and LAPACK:

mkdir -p ~/src/
cd ~/src/
wget http://www.netlib.org/blas/blas.tgz
tar xzf blas.tgz
cd BLAS-*

## NOTE: The selected Fortran compiler must be consistent for BLAS, LAPACK, NumPy, and SciPy.
## For GNU compiler on 32-bit systems:
#g77 -O2 -fno-second-underscore -c *.f                     # with g77
#gfortran -O2 -std=legacy -fno-second-underscore -c *.f    # with gfortran
## OR for GNU compiler on 64-bit systems:
#g77 -O3 -m64 -fno-second-underscore -fPIC -c *.f                     # with g77
gfortran -O3 -std=legacy -m64 -fno-second-underscore -fPIC -c *.f    # with gfortran
## OR for Intel compiler:
#ifort -FI -w90 -w95 -cm -O3 -unroll -c *.f

# Continue below irrespective of compiler:
ar r libfblas.a *.o
ranlib libfblas.a
rm -rf *.o
export BLAS=~/src/BLAS-*/libfblas.a

Execute only one of the five g77/gfortran/ifort commands. I have commented out all, but the gfortran which I use. The subsequent LAPACK installation requires a Fortran 90 compiler, and since both installs should use the same Fortran compiler, g77 should not be used for BLAS.

Next, you’ll need to install the LAPACK stuff. The SciPy webpage’s instructions helped me here as well, but I had to modify them to suit my environment:

mkdir -p ~/src
cd ~/src/
wget http://www.netlib.org/lapack/lapack.tgz
tar xzf lapack.tgz
cd lapack-*/
cp INSTALL/make.inc.gfortran make.inc          # On Linux with lapack-3.2.1 or newer
make lapacklib
make clean
export LAPACK=~/src/lapack-*/liblapack.a

Update on 3-Sep-2015: Verified some comments today (thanks to all): Before running make lapacklib edit the make.inc file and add -fPIC option to OPTS and NOOPT settings. If you are on a 64bit architecture or want to compile for one, also add -m64. It is important that BLAS and LAPACK are compiled with these options set to the same values. If you forget the -fPIC SciPy will actually give you an error about missing symbols and will recommend this switch. The specific section of make.inc looks like this in my setup:

FORTRAN  = gfortran 
OPTS     = -O2 -frecursive -fPIC -m64
DRVOPTS  = $(OPTS)
NOOPT    = -O0 -frecursive -fPIC -m64
LOADER   = gfortran

On old machines (e.g. RedHat 5), gfortran might be installed in an older version (e.g. 4.1.2) and does not understand option -frecursive. Simply remove it from the make.inc file in such cases.

The lapack test target of the Makefile fails in my setup because it cannot find the blas libraries. If you are thorough you can temporarily move the blas library to the specified location to test the lapack. I’m a lazy person, so I trust the devs to have it working and verify only in SciPy.


回答 1

如果您需要使用最新版本的SciPy而非打包的版本,而无需经历构建BLAS和LAPACK的麻烦,则可以按照以下过程进行操作。

从存储库安装线性代数库(对于Ubuntu),

sudo apt-get install gfortran libopenblas-dev liblapack-dev

然后安装SciPy(在下载SciPy源代码之后):python setup.py install

pip install scipy

视情况可以是。

If you need to use the latest versions of SciPy rather than the packaged version, without going through the hassle of building BLAS and LAPACK, you can follow the below procedure.

Install linear algebra libraries from repository (for Ubuntu),

sudo apt-get install gfortran libopenblas-dev liblapack-dev

Then install SciPy, (after downloading the SciPy source): python setup.py install or

pip install scipy

As the case may be.


回答 2

在Fedora上,这有效:

 yum install lapack lapack-devel blas blas-devel
 pip install numpy
 pip install scipy

请记住除了安装“ blas ”和“ lapack ”之外,还要安装“ lapack-devel ”和“ blas-devel ”,否则,您将得到所提到的错误或“ numpy.distutils.system_info。LapackNotFoundError ”错误。

On Fedora, this works:

 yum install lapack lapack-devel blas blas-devel
 pip install numpy
 pip install scipy

Remember to install ‘lapack-devel‘ and ‘blas-devel‘ in addition to ‘blas’ and ‘lapack’ otherwise you’ll get the error you mentioned or the “numpy.distutils.system_info.LapackNotFoundError” error.


回答 3

我猜您在谈论在Ubuntu中进行安装。只需使用:

apt-get install python-numpy python-scipy

那也应该照顾BLAS库的编译。否则,编译BLAS库非常困难。

I guess you are talking about installation in Ubuntu. Just use:

apt-get install python-numpy python-scipy

That should take care of the BLAS libraries compiling as well. Else, compiling the BLAS libraries is very difficult.


回答 4

对于Windows用户,Chris提供了一个不错的二进制程序包(警告:下载量很大,为191 MB):

For Windows users there is a nice binary package by Chris (warning: it’s a pretty large download, 191 MB):


回答 5

遵循“ cfi”给出的说明对我有用,尽管它们遗漏了一些您可能需要的部分:

1)解压缩后的lapack目录可能称为lapack-XY(某些版本号),因此您可以将其重命名为LAPACK。

cd ~/src
mv lapack-[tab] LAPACK

2)在该目录中,您可能需要执行以下操作:

cd ~/src/LAPACK 
cp lapack_LINUX.a libflapack.a

Following the instructions given by ‘cfi’ works for me, although there are a few pieces they left out that you might need:

1) Your lapack directory, after unzipping, may be called lapack-X-Y (some version number), so you can just rename that to LAPACK.

cd ~/src
mv lapack-[tab] LAPACK

2) In that directory, you may need to do:

cd ~/src/LAPACK 
cp lapack_LINUX.a libflapack.a

回答 6

尝试使用

sudo apt-get install python3-scipy

Try using

sudo apt-get install python3-scipy

如何在Python中找到线程ID

问题:如何在Python中找到线程ID

我有一个多线程Python程序和一个实用程序函数, writeLog(message),该写出时间戳记和消息。不幸的是,结果日志文件没有给出哪个线程正在生成哪个消息的指示。

我希望writeLog()能够在消息中添加一些内容,以标识哪个线程正在调用它。显然,我可以使线程将这些信息传递进来,但这将需要更多工作。是否有一些os.getpid()我可以使用的等效线程?

I have a multi-threading Python program, and a utility function, writeLog(message), that writes out a timestamp followed by the message. Unfortunately, the resultant log file gives no indication of which thread is generating which message.

I would like writeLog() to be able to add something to the message to identify which thread is calling it. Obviously I could just make the threads pass this information in, but that would be a lot more work. Is there some thread equivalent of os.getpid() that I could use?


回答 0

threading.get_ident(),或threading.current_thread().ident(或(threading.currentThread().ident对于python <2.6)。

threading.get_ident() works, or threading.current_thread().ident (or threading.currentThread().ident for Python < 2.6).


回答 1

使用日志记录模块,您可以在每个日志条目中自动添加当前线程标识符。只需在记录器格式字符串中使用以下LogRecord映射键之一:

%(thread)d: 线程ID(如果有)。

%(threadName)s: 线程名称(如果有)。

并使用它设置默认处理程序:

logging.basicConfig(format="%(threadName)s:%(message)s")

Using the logging module you can automatically add the current thread identifier in each log entry. Just use one of these LogRecord mapping keys in your logger format string:

%(thread)d : Thread ID (if available).

%(threadName)s : Thread name (if available).

and set up your default handler with it:

logging.basicConfig(format="%(threadName)s:%(message)s")

回答 2

thread.get_ident()函数在Linux上返回一个长整数。这实际上不是线程ID。

我使用这种方法来真正获取Linux上的线程ID:

import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

The thread.get_ident() function returns a long integer on Linux. It’s not really a thread id.

I use this method to really get the thread id on Linux:

import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')

# System dependent, see e.g. /usr/include/x86_64-linux-gnu/asm/unistd_64.h
SYS_gettid = 186

def getThreadId():
   """Returns OS thread id - Specific to Linux"""
   return libc.syscall(SYS_gettid)

回答 3


回答 4

我看到了这样的线程ID的示例:

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID
        ...

线程模块文档列表name属性,以及:

...

A thread has a name. 
The name can be passed to the constructor, 
and read or changed through the name attribute.

...

Thread.name

A string used for identification purposes only. 
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.

I saw examples of thread IDs like this:

class myThread(threading.Thread):
    def __init__(self, threadID, name, counter):
        self.threadID = threadID
        ...

The threading module docs lists name attribute as well:

...

A thread has a name. 
The name can be passed to the constructor, 
and read or changed through the name attribute.

...

Thread.name

A string used for identification purposes only. 
It has no semantics. Multiple threads may
be given the same name. The initial name is set by the constructor.

回答 5

您可以获得当前正在运行的线程的标识。如果当前线程结束,则该标识可以重用于其他线程。

创建线程实例时,将为该线程隐式指定一个名称,即模式:线程号

名称没有意义,名称不必唯一。所有正在运行的线程的标识都是唯一的。

import threading


def worker():
    print(threading.current_thread().name)
    print(threading.get_ident())


threading.Thread(target=worker).start()
threading.Thread(target=worker, name='foo').start()

函数threading.current_thread()返回当前正在运行的线程。该对象保存线程的全部信息。

You can get the ident of the current running thread. The ident could be reused for other threads, if the current thread ends.

When you crate an instance of Thread, a name is given implicit to the thread, which is the pattern: Thread-number

The name has no meaning and the name don’t have to be unique. The ident of all running threads is unique.

import threading


def worker():
    print(threading.current_thread().name)
    print(threading.get_ident())


threading.Thread(target=worker).start()
threading.Thread(target=worker, name='foo').start()

The function threading.current_thread() returns the current running thread. This object holds the whole information of the thread.


回答 6

我在Python中创建了多个线程,打印了线程对象,并使用ident变量打印了id 。我看到所有ID都一样:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

I created multiple threads in Python, I printed the thread objects, and I printed the id using the ident variable. I see all the ids are same:

<Thread(Thread-1, stopped 140500807628544)>
<Thread(Thread-2, started 140500807628544)>
<Thread(Thread-3, started 140500807628544)>

回答 7

与@brucexin类似,我需要获取操作系统级别的线程标识符(!= thread.get_ident()),并使用如下所示的内容来不依赖于特定的数字并且仅使用amd64:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)                                                                  

---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos

...

print 'my tid: %d' % xos.gettid()

这取决于Cython。

Similarly to @brucexin I needed to get OS-level thread identifier (which != thread.get_ident()) and use something like below not to depend on particular numbers and being amd64-only:

---- 8< ---- (xos.pyx)
"""module xos complements standard module os""" 

cdef extern from "<sys/syscall.h>":                                                             
    long syscall(long number, ...)                                                              
    const int SYS_gettid                                                                        

# gettid returns current OS thread identifier.                                                  
def gettid():                                                                                   
    return syscall(SYS_gettid)                                                                  

and

---- 8< ---- (test.py)
import pyximport; pyximport.install()
import xos

...

print 'my tid: %d' % xos.gettid()

this depends on Cython though.


使用csv模块从csv文件中读取特定列?

问题:使用csv模块从csv文件中读取特定列?

我正在尝试解析一个csv文件,并仅从特定列中提取数据。

范例csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

我想只捕获特定的列,说IDNameZipPhone

我看过的代码使我相信我可以通过其对应的编号来调用特定的列,即:Name将使用对应2并遍历每一行将row[2]产生列2中的所有项。只有这样,它才不会。

到目前为止,这是我所做的:

import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)
    csvfile.seek(0)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content

并且我希望这只会打印出我想要的每一行的特定列,除非不是,我只会得到最后一列。

I’m trying to parse through a csv file and extract the data from only specific columns.

Example csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

I’m trying to capture only specific columns, say ID, Name, Zip and Phone.

Code I’ve looked at has led me to believe I can call the specific column by its corresponding number, so ie: Name would correspond to 2 and iterating through each row using row[2] would produce all the items in column 2. Only it doesn’t.

Here’s what I’ve done so far:

import sys, argparse, csv
from settings import *

# command arguments
parser = argparse.ArgumentParser(description='csv to postgres',\
 fromfile_prefix_chars="@" )
parser.add_argument('file', help='csv file to import', action='store')
args = parser.parse_args()
csv_file = args.file

# open csv file
with open(csv_file, 'rb') as csvfile:

    # get number of columns
    for line in csvfile.readlines():
        array = line.split(',')
        first_item = array[0]

    num_columns = len(array)
    csvfile.seek(0)

    reader = csv.reader(csvfile, delimiter=' ')
        included_cols = [1, 2, 6, 7]

    for row in reader:
            content = list(row[i] for i in included_cols)
            print content

and I’m expecting that this will print out only the specific columns I want for each row except it doesn’t, I get the last column only.


回答 0

你会得到从这个代码的最后一列的唯一方法是,如果你不包括你的print语句for循环。

这很可能是代码的结尾:

for row in reader:
    content = list(row[i] for i in included_cols)
print content

您希望它是这样的:

for row in reader:
        content = list(row[i] for i in included_cols)
        print content

既然我们已经解决了您的错误,那么我想花时间向您介绍pandas模块。

Pandas在处理csv文件方面非常出色,下面的代码将是您读取csv并将整列保存到变量中所需的全部:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

因此,如果您想将列中的所有信息保存Names到变量中,则只需执行以下操作:

names = df.Names

这是一个很棒的模块,建议您研究一下。如果由于某种原因您的打印语句处于for循环状态,并且仍然仅打印出最后一列,则不应该发生,但是请让我知道我的假设是否错误。您发布的代码有很多缩进错误,因此很难知道应该在哪里。希望这对您有所帮助!

The only way you would be getting the last column from this code is if you don’t include your print statement in your for loop.

This is most likely the end of your code:

for row in reader:
    content = list(row[i] for i in included_cols)
print content

You want it to be this:

for row in reader:
        content = list(row[i] for i in included_cols)
        print content

Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.

Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:

import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']

so if you wanted to save all of the info in your column Names into a variable, this is all you need to do:

names = df.Names

It’s a great module and I suggest you look into it. If for some reason your print statement was in for loop and it was still only printing out the last column, which shouldn’t happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!


回答 1

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('file.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

print(columns['name'])
print(columns['phone'])
print(columns['street'])

带有类似的文件

name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.

将输出

>>> 
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']

或者,如果您希望对列进行数字索引:

with open('file.txt') as f:
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
print(columns[0])

>>> 
['Bob', 'James', 'Smithers']

要更改分隔符,请添加delimiter=" "适当的实例,即reader = csv.reader(f,delimiter=" ")

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('file.txt') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

print(columns['name'])
print(columns['phone'])
print(columns['street'])

With a file like

name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.

Will output

>>> 
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']

Or alternatively if you want numerical indexing for the columns:

with open('file.txt') as f:
    reader = csv.reader(f)
    reader.next()
    for row in reader:
        for (i,v) in enumerate(row):
            columns[i].append(v)
print(columns[0])

>>> 
['Bob', 'James', 'Smithers']

To change the deliminator add delimiter=" " to the appropriate instantiation, i.e reader = csv.reader(f,delimiter=" ")


回答 2

使用熊猫

import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']

在解析时丢弃不需要的列:

my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

PS:我只是以一种简单的方式来汇总别人的话。实际的答案是从这里这里

Use pandas:

import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']

Discard unneeded columns at parse time:

my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

P.S. I’m just aggregating what other’s have said in a simple manner. Actual answers are taken from here and here.


回答 3

随着熊猫,你可以使用read_csv带有usecols参数:

df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

例:

import pandas as pd
import io

s = '''
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
'''

df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
print(df)

   total_bill  day  size
0       16.99  Sun     2
1       10.34  Sun     3
2       21.01  Sun     3

With pandas you can use read_csv with usecols parameter:

df = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])

Example:

import pandas as pd
import io

s = '''
total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
'''

df = pd.read_csv(io.StringIO(s), usecols=['total_bill', 'day', 'size'])
print(df)

   total_bill  day  size
0       16.99  Sun     2
1       10.34  Sun     3
2       21.01  Sun     3

回答 4

您可以使用numpy.loadtext(filename)。例如,如果这是您的数据库.csv

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

您想要该Name列:

import numpy as np 
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))

>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

您可以更轻松地使用genfromtext

b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

You can use numpy.loadtext(filename). For example if this is your database .csv:

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

And you want the Name column:

import numpy as np 
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))

>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

More easily you can use genfromtext:

b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '], 
      dtype='|S7')

回答 5

上下文:对于这类工作,您应该使用令人惊叹的python petl库。通过标准的csv模块“手动”执行操作,可以节省大量工作和潜在的挫败感。AFAIK,唯一仍在使用csv模块的人是尚未发现更好的工具来处理表格数据(熊猫,petl等)的人,这很好,但是如果您打算在其中处理大量数据,您可以从各种各样的陌生来源获得职业,学习像petl这样的东西是您可以做出的最好的投资之一。pip安装petl后,只需30分钟即可开始使用。该文档非常好。

答:假设您在csv文件中拥有第一个表(也可以使用petl直接从数据库中加载)。然后,您只需加载它并执行以下操作。

from petl import fromcsv, look, cut, tocsv 

#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')

Context: For this type of work you should use the amazing python petl library. That will save you a lot of work and potential frustration from doing things ‘manually’ with the standard csv module. AFAIK, the only people who still use the csv module are those who have not yet discovered better tools for working with tabular data (pandas, petl, etc.), which is fine, but if you plan to work with a lot of data in your career from various strange sources, learning something like petl is one of the best investments you can make. To get started should only take 30 minutes after you’ve done pip install petl. The documentation is excellent.

Answer: Let’s say you have the first table in a csv file (you can also load directly from the database using petl). Then you would simply load it and do the following.

from petl import fromcsv, look, cut, tocsv 

#Load the table
table1 = fromcsv('table1.csv')
# Alter the colums
table2 = cut(table1, 'Song_Name','Artist_ID')
#have a quick look to make sure things are ok. Prints a nicely formatted table to your console
print look(table2)
# Save to new file
tocsv(table2, 'new.csv')

回答 6

我认为有一个更简单的方法

import pandas as pd

dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values

因此在这里iloc[:, 0]:表示所有值,0表示列的位置。在下面的示例ID中将被选中

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

I think there is an easier way

import pandas as pd

dataset = pd.read_csv('table1.csv')
ftCol = dataset.iloc[:, 0].values

So in here iloc[:, 0], : means all values, 0 means the position of the column. in the example below ID will be selected

ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | C... | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |

回答 7

import pandas as pd 
csv_file = pd.read_csv("file.csv") 
column_val_list = csv_file.column_name._ndarray_values
import pandas as pd 
csv_file = pd.read_csv("file.csv") 
column_val_list = csv_file.column_name._ndarray_values

回答 8

多亏了您可以为熊猫数据帧建立索引并对其进行子集化的一种方法,一种将csv文件中的单个列提取到变量中的非常简单的方法是:

myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']

需要考虑的几件事:

上面的代码片断会产生大熊猫Series并没有dataframeusecols如果速度是一个问题,ayhan和ayhan的建议也会更快。使用以下方法测试两种不同的方法%timeit大小为2122 KB的csv文件,将产生22.8 msusecols方法和53 ms我建议的方法。

别忘了 import pandas as pd

Thanks to the way you can index and subset a pandas dataframe, a very easy way to extract a single column from a csv file into a variable is:

myVar = pd.read_csv('YourPath', sep = ",")['ColumnName']

A few things to consider:

The snippet above will produce a pandas Series and not dataframe. The suggestion from ayhan with usecols will also be faster if speed is an issue. Testing the two different approaches using %timeit on a 2122 KB sized csv file yields 22.8 ms for the usecols approach and 53 ms for my suggested approach.

And don’t forget import pandas as pd


回答 9

如果您需要分别处理这些列,那么我想使用zip(*iterable)模式来对这些列进行解构(有效地“解压缩”)。因此,对于您的示例:

ids, names, zips, phones = zip(*(
  (row[1], row[2], row[6], row[7])
  for row in reader
))

If you need to process the columns separately, I like to destructure the columns with the zip(*iterable) pattern (effectively “unzip”). So for your example:

ids, names, zips, phones = zip(*(
  (row[1], row[2], row[6], row[7])
  for row in reader
))

回答 10

抓取列名,而不是使用readlines方法()更好地使用的ReadLine() ,以避免循环和读取的完整文件&其存储在数组中。

with open(csv_file, 'rb') as csvfile:

    # get number of columns

    line = csvfile.readline()

    first_item = line.split(',')

To fetch column name, instead of using readlines() better use readline() to avoid loop & reading the complete file & storing it in the array.

with open(csv_file, 'rb') as csvfile:

    # get number of columns

    line = csvfile.readline()

    first_item = line.split(',')

Python 3.x舍入行为

问题:Python 3.x舍入行为

我只是在重新阅读Python 3.0的新增功能,它指出:

round()函数的舍入策略和返回类型已更改。现在,精确的中途案例将舍入到最接近的偶数结果,而不是从零舍入。(例如,round(2.5)现在返回2而不是3。)

以及关于round的文档:

对于支持round()的内置类型,将值四舍五入为乘幂n的最接近10的倍数;如果两个倍数相等接近,则四舍五入取整为偶数选择

因此,在v2.7.3下:

In [85]: round(2.5)
Out[85]: 3.0

In [86]: round(3.5)
Out[86]: 4.0

如我所料 但是,现在在v3.2.3下:

In [32]: round(2.5)
Out[32]: 2

In [33]: round(3.5)
Out[33]: 4

这似乎是违反直觉的,与我对四舍五入(并可能绊倒人)的理解相反。英语不是我的母语,但是直到我读了这篇文章,我才认为我知道四舍五入的含义:-/我确定在引入v3时一定对此进行了一些讨论,但我找不到很好的理由。我的搜索。

  1. 有人知道为什么将其更改为此吗?
  2. 是否有其他主流编程语言(例如C,C ++,Java,Perl等)进行这种(对我来说是不一致的)舍入?

我在这里想念什么?

更新:@ Li-aungYip的评论有关“银行取整”为我提供了正确的搜索词/关键字进行搜索,我发现了这样一个问题:为什么.NET默认使用银行取整?,所以我会仔细阅读。

I was just re-reading What’s New In Python 3.0 and it states:

The round() function rounding strategy and return type have changed. Exact halfway cases are now rounded to the nearest even result instead of away from zero. (For example, round(2.5) now returns 2 rather than 3.)

and the documentation for round:

For the built-in types supporting round(), values are rounded to the closest multiple of 10 to the power minus n; if two multiples are equally close, rounding is done toward the even choice

So, under v2.7.3:

In [85]: round(2.5)
Out[85]: 3.0

In [86]: round(3.5)
Out[86]: 4.0

as I’d have expected. However, now under v3.2.3:

In [32]: round(2.5)
Out[32]: 2

In [33]: round(3.5)
Out[33]: 4

This seems counter-intuitive and contrary to what I understand about rounding (and bound to trip up people). English isn’t my native language but until I read this I thought I knew what rounding meant :-/ I am sure at the time v3 was introduced there must have been some discussion of this, but I was unable to find a good reason in my search.

  1. Does anyone have insight into why this was changed to this?
  2. Are there any other mainstream programming languages (e.g., C, C++, Java, Perl, ..) that do this sort of (to me inconsistent) rounding?

What am I missing here?

UPDATE: @Li-aungYip’s comment re “Banker’s rounding” gave me the right search term/keywords to search for and I found this SO question: Why does .NET use banker’s rounding as default?, so I will be reading that carefully.


回答 0

如今,Python 3.0的方法被认为是标准的舍入方法,尽管某些语言实现尚未上市。

简单的“总是向上取整0.5”技术会导致对较高数字的轻微偏向。通过大量计算,这可能很重要。Python 3.0方法消除了这个问题。

常用的舍入方法不止一种。浮点数学的国际标准IEEE 754定义了五种不同的舍入方法(Python 3.0使用的是默认的一种)。还有其他。

这种行为并未得到应有的广泛了解。如果我没记错的话,AppleScript是这种舍入方法的早期采用者。roundAppleScript中的命令实际上确实提供了几个选项,但是IEEE 754中的默认设置是roundtoward-even round。他实现的“学校”:round 2.5 rounding as taught in school是有效的AppleScript命令。:-)

Python 3’s way (called “round half to even” or “banker’s rounding”) is considered the standard rounding method these days, though some language implementations aren’t on the bus yet.

The simple “always round 0.5 up” technique results in a slight bias toward the higher number. With large numbers of calculations, this can be significant. The Python 3.0 approach eliminates this issue.

There is more than one method of rounding in common use. IEEE 754, the international standard for floating-point math, defines five different rounding methods (the one used by Python 3.0 is the default). And there are others.

This behavior is not as widely known as it ought to be. AppleScript was, if I remember correctly, an early adopter of this rounding method. The round command in AppleScript offers several options, but round-toward-even is the default as it is in IEEE 754. Apparently the engineer who implemented the round command got so fed up with all the requests to “make it work like I learned in school” that he implemented just that: round 2.5 rounding as taught in school is a valid AppleScript command. 🙂


Python Flask,如何设置内容类型

问题:Python Flask,如何设置内容类型

我正在使用Flask,并且从get请求返回一个XML文件。如何将内容类型设置为xml?

例如

@app.route('/ajax_ddl')
def ajax_ddl():
    xml = 'foo'
    header("Content-type: text/xml")
    return xml

I am using Flask and I return an XML file from a get request. How do I set the content type to xml ?

e.g.

@app.route('/ajax_ddl')
def ajax_ddl():
    xml = 'foo'
    header("Content-type: text/xml")
    return xml

回答 0

尝试这样:

from flask import Response
@app.route('/ajax_ddl')
def ajax_ddl():
    xml = 'foo'
    return Response(xml, mimetype='text/xml')

实际的Content-Type基于mimetype参数和字符集(默认为UTF-8)。

响应(和请求)对象记录在这里:http : //werkzeug.pocoo.org/docs/wrappers/

Try like this:

from flask import Response
@app.route('/ajax_ddl')
def ajax_ddl():
    xml = 'foo'
    return Response(xml, mimetype='text/xml')

The actual Content-Type is based on the mimetype parameter and the charset (defaults to UTF-8).

Response (and request) objects are documented here: http://werkzeug.pocoo.org/docs/wrappers/


回答 1

就这么简单

x = "some data you want to return"
return x, 200, {'Content-Type': 'text/css; charset=utf-8'}

希望能帮助到你

更新:使用此方法,因为它可以与python 2.x和python 3.x一起使用

其次,它还消除了多头问题。

from flask import Response
r = Response(response="TEST OK", status=200, mimetype="application/xml")
r.headers["Content-Type"] = "text/xml; charset=utf-8"
return r

As simple as this

x = "some data you want to return"
return x, 200, {'Content-Type': 'text/css; charset=utf-8'}

Hope it helps

Update: Use this method because it will work with both python 2.x and python 3.x

and secondly it also eliminates multiple header problem.

from flask import Response
r = Response(response="TEST OK", status=200, mimetype="application/xml")
r.headers["Content-Type"] = "text/xml; charset=utf-8"
return r

回答 2

我喜欢并赞成@Simon Sapin的答案。但是,我最终采取了稍有不同的策略,并创建了自己的装饰器:

from flask import Response
from functools import wraps

def returns_xml(f):
    @wraps(f)
    def decorated_function(*args, **kwargs):
        r = f(*args, **kwargs)
        return Response(r, content_type='text/xml; charset=utf-8')
    return decorated_function

并以此方式使用它:

@app.route('/ajax_ddl')
@returns_xml
def ajax_ddl():
    xml = 'foo'
    return xml

我认为这稍微舒适一些。

I like and upvoted @Simon Sapin’s answer. I ended up taking a slightly different tack, however, and created my own decorator:

from flask import Response
from functools import wraps

def returns_xml(f):
    @wraps(f)
    def decorated_function(*args, **kwargs):
        r = f(*args, **kwargs)
        return Response(r, content_type='text/xml; charset=utf-8')
    return decorated_function

and use it thus:

@app.route('/ajax_ddl')
@returns_xml
def ajax_ddl():
    xml = 'foo'
    return xml

I think this is slightly more comfortable.


回答 3

使用make_response方法获取数据响应。然后设置mimetype属性。最后返回此响应:

@app.route('/ajax_ddl')
def ajax_ddl():
    xml = 'foo'
    resp = app.make_response(xml)
    resp.mimetype = "text/xml"
    return resp

如果Response直接使用,您将失去通过设置来定制响应的机会app.response_class。该make_response方法使用app.responses_class制作响应对象。在此您可以创建自己的类,添加以使您的应用程序全局使用它:

class MyResponse(app.response_class):
    def __init__(self, *args, **kwargs):
        super(MyResponse, self).__init__(*args, **kwargs)
        self.set_cookie("last-visit", time.ctime())

app.response_class = MyResponse  

Use the make_response method to get a response with your data. Then set the mimetype attribute. Finally return this response:

@app.route('/ajax_ddl')
def ajax_ddl():
    xml = 'foo'
    resp = app.make_response(xml)
    resp.mimetype = "text/xml"
    return resp

If you use Response directly, you lose the chance to customize the responses by setting app.response_class. The make_response method uses the app.responses_class to make the response object. In this you can create your own class, add make your application uses it globally:

class MyResponse(app.response_class):
    def __init__(self, *args, **kwargs):
        super(MyResponse, self).__init__(*args, **kwargs)
        self.set_cookie("last-visit", time.ctime())

app.response_class = MyResponse  

回答 4

from flask import Flask, render_template, make_response
app = Flask(__name__)

@app.route('/user/xml')
def user_xml():
    resp = make_response(render_template('xml/user.html', username='Ryan'))
    resp.headers['Content-type'] = 'text/xml; charset=utf-8'
    return resp
from flask import Flask, render_template, make_response
app = Flask(__name__)

@app.route('/user/xml')
def user_xml():
    resp = make_response(render_template('xml/user.html', username='Ryan'))
    resp.headers['Content-type'] = 'text/xml; charset=utf-8'
    return resp

回答 5

通常,您不必Response自己创建对象,因为make_response()它将为您处理。

from flask import Flask, make_response                                      
app = Flask(__name__)                                                       

@app.route('/')                                                             
def index():                                                                
    bar = '<body>foo</body>'                                                
    response = make_response(bar)                                           
    response.headers['Content-Type'] = 'text/xml; charset=utf-8'            
    return response

还有一件事,似乎没有人提到after_this_request,我想说点什么:

after_this_request

在此请求后执行功能。这对于修改响应对象很有用。该函数传递给响应对象,并且必须返回相同或新的对象。

因此我们可以使用来实现after_this_request,代码应如下所示:

from flask import Flask, after_this_request
app = Flask(__name__)

@app.route('/')
def index():
    @after_this_request
    def add_header(response):
        response.headers['Content-Type'] = 'text/xml; charset=utf-8'
        return response
    return '<body>foobar</body>'

Usually you don’t have to create the Response object yourself because make_response() will take care of that for you.

from flask import Flask, make_response                                      
app = Flask(__name__)                                                       

@app.route('/')                                                             
def index():                                                                
    bar = '<body>foo</body>'                                                
    response = make_response(bar)                                           
    response.headers['Content-Type'] = 'text/xml; charset=utf-8'            
    return response

One more thing, it seems that no one mentioned the after_this_request, I want to say something:

after_this_request

Executes a function after this request. This is useful to modify response objects. The function is passed the response object and has to return the same or a new one.

so we can do it with after_this_request, the code should look like this:

from flask import Flask, after_this_request
app = Flask(__name__)

@app.route('/')
def index():
    @after_this_request
    def add_header(response):
        response.headers['Content-Type'] = 'text/xml; charset=utf-8'
        return response
    return '<body>foobar</body>'

回答 6

您可以尝试以下方法(python3.6.2):

情况一:

@app.route('/hello')
def hello():

    headers={ 'content-type':'text/plain' ,'location':'http://www.stackoverflow'}
    response = make_response('<h1>hello world</h1>',301)
    response.headers = headers
    return response

案例二:

@app.route('/hello')
def hello():

    headers={ 'content-type':'text/plain' ,'location':'http://www.stackoverflow.com'}
    return '<h1>hello world</h1>',301,headers

我正在使用Flask。如果您想返回json,您可以编写以下代码:

import json # 
@app.route('/search/<keyword>')
def search(keyword):

    result = Book.search_by_keyword(keyword)
    return json.dumps(result),200,{'content-type':'application/json'}


from flask import jsonify
@app.route('/search/<keyword>')
def search(keyword):

    result = Book.search_by_keyword(keyword)
    return jsonify(result)

You can try the following method(python3.6.2):

case one:

@app.route('/hello')
def hello():

    headers={ 'content-type':'text/plain' ,'location':'http://www.stackoverflow'}
    response = make_response('<h1>hello world</h1>',301)
    response.headers = headers
    return response

case two:

@app.route('/hello')
def hello():

    headers={ 'content-type':'text/plain' ,'location':'http://www.stackoverflow.com'}
    return '<h1>hello world</h1>',301,headers

I am using Flask .And if you want to return json,you can write this:

import json # 
@app.route('/search/<keyword>')
def search(keyword):

    result = Book.search_by_keyword(keyword)
    return json.dumps(result),200,{'content-type':'application/json'}


from flask import jsonify
@app.route('/search/<keyword>')
def search(keyword):

    result = Book.search_by_keyword(keyword)
    return jsonify(result)

有趣好用的Python教程

退出移动版