标签归档:header

如何将标题行添加到Pandas DataFrame

问题:如何将标题行添加到Pandas DataFrame

我正在将csv文件读入pandas。此csv文件由四列和一些行组成,但没有要添加的标题行。我一直在尝试以下方法:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

但是,当我应用代码时,出现以下错误:

ValueError: Shape of passed values is (1, 1), indices imply (4, 1)

错误的确切含义是什么?在python中将标题行添加到csv文件/ pandas df的一种干净方法是什么?

I am reading a csv file into pandas. This csv file constists of four columns and some rows, but does not have a header row, which I want to add. I have been trying the following:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

But when I apply the code, I get the following Error:

ValueError: Shape of passed values is (1, 1), indices imply (4, 1)

What exactly does the error mean? And what would be a clean way in python to add a header row to my csv file/pandas df?


回答 0

您可以names直接在read_csv

names:类似数组,默认为None要使用的列名列表。如果文件不包含标题行,则应显式传递header = None

Cov = pd.read_csv("path/to/file.txt", 
                  sep='\t', 
                  names=["Sequence", "Start", "End", "Coverage"])

You can use names directly in the read_csv

names : array-like, default None List of column names to use. If file contains no header row, then you should explicitly pass header=None

Cov = pd.read_csv("path/to/file.txt", 
                  sep='\t', 
                  names=["Sequence", "Start", "End", "Coverage"])

回答 1

另外,您可以阅读csv,header=None然后添加df.columns

Cov = pd.read_csv("path/to/file.txt", sep='\t', header=None)
Cov.columns = ["Sequence", "Start", "End", "Coverage"]

Alternatively you could read you csv with header=None and then add it with df.columns:

Cov = pd.read_csv("path/to/file.txt", sep='\t', header=None)
Cov.columns = ["Sequence", "Start", "End", "Coverage"]

回答 2

col_Names=["Sequence", "Start", "End", "Coverage"]
my_CSV_File= pd.read_csv("yourCSVFile.csv",names=col_Names)

完成此操作后,只需进行检查[显然,我知道,你知道。但是…

my_CSV_File.head()

希望对您有帮助…干杯

col_Names=["Sequence", "Start", "End", "Coverage"]
my_CSV_File= pd.read_csv("yourCSVFile.csv",names=col_Names)

having done this, just check it with[well obviously I know, u know that. But still…

my_CSV_File.head()

Hope it helps … Cheers


回答 3

要修复代码,您只需将更[Cov]改为Cov.values,第一个参数pd.DataFrame将变为多维numpy数组:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame(Cov.values, columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

但最聪明的解决方案仍然是pd.read_excelheader=None和一起使用names=columns_list

To fix your code you can simply change [Cov] to Cov.values, the first parameter of pd.DataFrame will become a multi-dimensional numpy array:

Cov = pd.read_csv("path/to/file.txt", sep='\t')
Frame=pd.DataFrame(Cov.values, columns = ["Sequence", "Start", "End", "Coverage"])
Frame.to_csv("path/to/file.txt", sep='\t')

But the smartest solution still is use pd.read_excel with header=None and names=columns_list.


Python文件的常见标头格式是什么?

问题:Python文件的常见标头格式是什么?

在有关Python编码准则的文档中,我遇到了以下Python源文件的头格式:

#!/usr/bin/env python

"""Foobar.py: Description of what foobar does."""

__author__      = "Barack Obama"
__copyright__   = "Copyright 2009, Planet Earth"

这是Python世界中标头的标准格式吗?我还可以在标题中添加哪些其他字段/信息?Python专家分享了有关好的Python源标头的指南:-)

I came across the following header format for Python source files in a document about Python coding guidelines:

#!/usr/bin/env python

"""Foobar.py: Description of what foobar does."""

__author__      = "Barack Obama"
__copyright__   = "Copyright 2009, Planet Earth"

Is this the standard format of headers in the Python world? What other fields/information can I put in the header? Python gurus share your guidelines for good Python source headers :-)


回答 0

它是Foobar模块的所有元数据。

第一个是docstring模块的,已经在Peter的答案中进行了解释。

如何组织模块(源文件)?(存档)

每个文件的第一行应为#!/usr/bin/env python这样就可以将文件作为脚本运行,例如在CGI上下文中隐式调用解释器。

接下来应该是带有说明的文档字符串。如果描述很长,那么第一行应该是一个简短的摘要,该摘要应具有其自身的意义,并以换行符分隔其余部分。

所有代码(包括import语句)都应遵循docstring。否则,解释器将无法识别该文档字符串,并且您将无法在交互式会话中访问该文档字符串(例如,通过obj.__doc__)或使用自动化工具生成文档时。

首先导入内置模块,然后导入第三方模块,然后再导入路径和您自己的模块。特别是,模块的路径和名称的添加可能会快速更改:将它们放在一个位置可以使它们更容易找到。

接下来应该是作者信息。此信息应遵循以下格式:

__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"
__copyright__ = "Copyright 2007, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",
                    "Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.0.1"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"

状态通常应为“原型”,“开发”或“生产”之一。__maintainer__应该是将修复错误并进行改进(如果导入)的人。__credits__不同之处在于__author__,这些__credits__人报告了错误修复,提出了建议等,但实际上并未编写代码。

在这里你有更多的信息,上市__author____authors____contact____copyright____license____deprecated____date____version__作为公认的元数据。

Its all metadata for the Foobar module.

The first one is the docstring of the module, that is already explained in Peter’s answer.

How do I organize my modules (source files)? (Archive)

The first line of each file shoud be #!/usr/bin/env python. This makes it possible to run the file as a script invoking the interpreter implicitly, e.g. in a CGI context.

Next should be the docstring with a description. If the description is long, the first line should be a short summary that makes sense on its own, separated from the rest by a newline.

All code, including import statements, should follow the docstring. Otherwise, the docstring will not be recognized by the interpreter, and you will not have access to it in interactive sessions (i.e. through obj.__doc__) or when generating documentation with automated tools.

Import built-in modules first, followed by third-party modules, followed by any changes to the path and your own modules. Especially, additions to the path and names of your modules are likely to change rapidly: keeping them in one place makes them easier to find.

Next should be authorship information. This information should follow this format:

__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"
__copyright__ = "Copyright 2007, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",
                    "Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.0.1"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"

Status should typically be one of “Prototype”, “Development”, or “Production”. __maintainer__ should be the person who will fix bugs and make improvements if imported. __credits__ differs from __author__ in that __credits__ includes people who reported bug fixes, made suggestions, etc. but did not actually write the code.

Here you have more information, listing __author__, __authors__, __contact__, __copyright__, __license__, __deprecated__, __date__ and __version__ as recognized metadata.


回答 1

我强烈赞成使用最少的文件头,我的意思是:

  • #!如果是可执行脚本,则使用hashbang(行)
  • 模块文档字符串
  • 导入,以标准方式分组,例如:
  import os    # standard library
  import sys

  import requests  # 3rd party packages

  import mypackage.mymodule  # local source
  import mypackage.myothermodule  

即。三组进口,它们之间有一个空白行。在每个组中,对进口进行排序。最后一组,从本地来源进口,可以是如图所示的绝对进口,也可以是明确的相对进口。

其他所有内容都浪费时间,占用视觉空间,并且会产生误导。

如果您有法律免责声明或许可信息,它将进入一个单独的文件中。它不需要感染每个源代码文件。您的版权应该是其中的一部分。人们应该可以在您的网站上找到它LICENSE文件中,而不是随机的源代码。

源代码控件已经维护了诸如作者身份和日期之类的元数据。无需在文件本身中添加更详细,错误且过时的相同信息版本。

我不认为每个人都需要将任何其他数据放入其所有源文件中。您可能有这样做的特定要求,但是根据定义,这种情况仅适用于您。它们在“为所有人推荐的通用标题”中没有位置。

I strongly favour minimal file headers, by which I mean just:

  • The hashbang (#! line) if this is an executable script
  • Module docstring
  • Imports, grouped in the standard way, eg:
  import os    # standard library
  import sys

  import requests  # 3rd party packages

  import mypackage.mymodule  # local source
  import mypackage.myothermodule  

ie. three groups of imports, with a single blank line between them. Within each group, imports are sorted. The final group, imports from local source, can either be absolute imports as shown, or explicit relative imports.

Everything else is a waste of time, visual space, and is actively misleading.

If you have legal disclaimers or licencing info, it goes into a separate file. It does not need to infect every source code file. Your copyright should be part of this. People should be able to find it in your LICENSE file, not random source code.

Metadata such as authorship and dates is already maintained by your source control. There is no need to add a less-detailed, erroneous, and out-of-date version of the same info in the file itself.

I don’t believe there is any other data that everyone needs to put into all their source files. You may have some particular requirement to do so, but such things apply, by definition, only to you. They have no place in “general headers recommended for everyone”.


回答 2

上面的答案确实很完整,但是如果您想要一个快速且脏的标题来复制粘贴,请使用以下命令:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Module documentation goes here
   and here
   and ...
"""

为什么这是一个好人:

  • 第一行适用于* nix用户。它将在用户路径中选择Python解释器,因此将自动选择用户首选的解释器。
  • 第二个是文件编码。如今,每个文件都必须具有关联的编码。UTF-8可以在任何地方使用。只是遗留项目会使用其他编码。
  • 和一个非常简单的文档。它可以填充多行。

也可以看看: https //www.python.org/dev/peps/pep-0263/

如果仅在每个文件中编写一个类,则甚至不需要文档(它会放在类doc中)。

The answers above are really complete, but if you want a quick and dirty header to copy’n paste, use this:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Module documentation goes here
   and here
   and ...
"""

Why this is a good one:

  • The first line is for *nix users. It will choose the Python interpreter in the user path, so will automatically choose the user preferred interpreter.
  • The second one is the file encoding. Nowadays every file must have a encoding associated. UTF-8 will work everywhere. Just legacy projects would use other encoding.
  • And a very simple documentation. It can fill multiple lines.

See also: https://www.python.org/dev/peps/pep-0263/

If you just write a class in each file, you don’t even need the documentation (it would go inside the class doc).


回答 3

如果使用的是非ASCII字符集,另请参阅PEP 263

抽象

该PEP建议引入一种语法,以声明Python源文件的编码。然后,Python解析器将使用编码信息使用给定的编码来解释文件。最值得注意的是,这增强了源代码中Unicode文字的解释,并使得可以在例如Unicode的编辑器中直接使用UTF-8编写Unicode文字。

问题

在Python 2.1中,只能使用基于Latin-1的编码“ unicode-escape”编写Unicode文字。这使得编程环境对在许多亚洲国家这样的非拉丁1语言环境中生活和工作的Python用户而言相当不利。程序员可以使用最喜欢的编码来编写其8位字符串,但是绑定到Unicode文字的“ unicode-escape”编码。

拟议的解决方案

我建议通过在文件顶部使用特殊注释来声明编码,从而使每个源文件都可以看到和更改Python源代码。

为了使Python知道此编码声明,在处理Python源代码数据方面需要进行一些概念上的更改。

定义编码

如果未提供其他编码提示,Python将默认使用ASCII作为标准编码。

要定义源代码编码,必须将魔术注释作为源文件的第一行或第二行放置在源文件中,例如:

      # coding=<encoding name>

或(使用流行的编辑器认可的格式)

      #!/usr/bin/python
      # -*- coding: <encoding name> -*-

要么

      #!/usr/bin/python
      # vim: set fileencoding=<encoding name> :

Also see PEP 263 if you are using a non-ascii characterset

Abstract

This PEP proposes to introduce a syntax to declare the encoding of a Python source file. The encoding information is then used by the Python parser to interpret the file using the given encoding. Most notably this enhances the interpretation of Unicode literals in the source code and makes it possible to write Unicode literals using e.g. UTF-8 directly in an Unicode aware editor.

Problem

In Python 2.1, Unicode literals can only be written using the Latin-1 based encoding “unicode-escape”. This makes the programming environment rather unfriendly to Python users who live and work in non-Latin-1 locales such as many of the Asian countries. Programmers can write their 8-bit strings using the favorite encoding, but are bound to the “unicode-escape” encoding for Unicode literals.

Proposed Solution

I propose to make the Python source code encoding both visible and changeable on a per-source file basis by using a special comment at the top of the file to declare the encoding.

To make Python aware of this encoding declaration a number of concept changes are necessary with respect to the handling of Python source code data.

Defining the Encoding

Python will default to ASCII as standard encoding if no other encoding hints are given.

To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file, such as:

      # coding=<encoding name>

or (using formats recognized by popular editors)

      #!/usr/bin/python
      # -*- coding: <encoding name> -*-

or

      #!/usr/bin/python
      # vim: set fileencoding=<encoding name> :