## 问题：如何将一列分为两列？

I have a data frame with one (string) column and I’d like to split it into two (string) columns, with one column header as ‘`fips'` and the other `'row'`

My dataframe `df` looks like this:

I do not know how to use `df.row.str[:]` to achieve my goal of splitting the row cell. I can use `df['fips'] = hello` to add a new column and populate it with `hello`. Any ideas?

## 回答 0

``````df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['flips','row'])
``````
``````   flips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL
``````

There might be a better way, but this here’s one approach:

``````df = pd.DataFrame(df.row.str.split(' ',1).tolist(),
columns = ['flips','row'])
``````
``````   flips                 row
0  00000       UNITED STATES
1  01000             ALABAMA
2  01001  Autauga County, AL
3  01003  Baldwin County, AL
4  01005  Barbour County, AL
``````

# TL; DR版本：

• 我有一个带有定界符的文本列，我想要两列

``df['A'], df['B'] = df['AB'].str.split(' ', 1).str``

``df['AB'].str.split(' ', 1, expand=True)``

`expand=True`如果字符串的分割数不一致，并且要`None`替换缺失的值，则必须使用。

# 详细地：

1：如果不确定`.str.split()`do 的前两个参数是什么，我建议使用该方法纯Python版本的文档。

• 包含两个元素的列表的列

• 两列，每列包含列表的相应元素？

`expand=True`版本虽然较长，但与元组拆包方法相比具有明显的优势。元组解压缩不能很好地处理不同长度的拆分：

# TL;DR version:

For the simple case of:

• I have a text column with a delimiter and I want two columns

The simplest solution is:

``````df[['A', 'B']] = df['AB'].str.split(' ', 1, expand=True)
``````

You must use `expand=True` if your strings have a non-uniform number of splits and you want `None` to replace the missing values.

Notice how, in either case, the `.tolist()` method is not necessary. Neither is `zip()`.

# In detail:

Andy Hayden’s solution is most excellent in demonstrating the power of the `str.extract()` method.

But for a simple split over a known separator (like, splitting by dashes, or splitting by whitespace), the `.str.split()` method is enough1. It operates on a column (Series) of strings, and returns a column (Series) of lists:

``````>>> import pandas as pd
>>> df = pd.DataFrame({'AB': ['A1-B1', 'A2-B2']})
>>> df

AB
0  A1-B1
1  A2-B2
>>> df['AB_split'] = df['AB'].str.split('-')
>>> df

AB  AB_split
0  A1-B1  [A1, B1]
1  A2-B2  [A2, B2]
``````

1: If you’re unsure what the first two parameters of `.str.split()` do, I recommend the docs for the plain Python version of the method.

But how do you go from:

• a column containing two-element lists

to:

• two columns, each containing the respective element of the lists?

Well, we need to take a closer look at the `.str` attribute of a column.

It’s a magical object that is used to collect methods that treat each element in a column as a string, and then apply the respective method in each element as efficient as possible:

``````>>> upper_lower_df = pd.DataFrame({"U": ["A", "B", "C"]})
>>> upper_lower_df

U
0  A
1  B
2  C
>>> upper_lower_df["L"] = upper_lower_df["U"].str.lower()
>>> upper_lower_df

U  L
0  A  a
1  B  b
2  C  c
``````

But it also has an “indexing” interface for getting each element of a string by its index:

``````>>> df['AB'].str[0]

0    A
1    A
Name: AB, dtype: object

>>> df['AB'].str[1]

0    1
1    2
Name: AB, dtype: object
``````

Of course, this indexing interface of `.str` doesn’t really care if each element it’s indexing is actually a string, as long as it can be indexed, so:

``````>>> df['AB'].str.split('-', 1).str[0]

0    A1
1    A2
Name: AB, dtype: object

>>> df['AB'].str.split('-', 1).str[1]

0    B1
1    B2
Name: AB, dtype: object
``````

Then, it’s a simple matter of taking advantage of the Python tuple unpacking of iterables to do

``````>>> df['A'], df['B'] = df['AB'].str.split('-', 1).str
>>> df

AB  AB_split   A   B
0  A1-B1  [A1, B1]  A1  B1
1  A2-B2  [A2, B2]  A2  B2
``````

Of course, getting a DataFrame out of splitting a column of strings is so useful that the `.str.split()` method can do it for you with the `expand=True` parameter:

``````>>> df['AB'].str.split('-', 1, expand=True)

0   1
0  A1  B1
1  A2  B2
``````

So, another way of accomplishing what we wanted is to do:

``````>>> df = df[['AB']]
>>> df

AB
0  A1-B1
1  A2-B2

>>> df.join(df['AB'].str.split('-', 1, expand=True).rename(columns={0:'A', 1:'B'}))

AB   A   B
0  A1-B1  A1  B1
1  A2-B2  A2  B2
``````

The `expand=True` version, although longer, has a distinct advantage over the tuple unpacking method. Tuple unpacking doesn’t deal well with splits of different lengths:

But `expand=True` handles it nicely by placing `None` in the columns for which there aren’t enough “splits”:

## 回答 2

``(?P<fips>\d{5})``
• 匹配五个数字（`\d`）并命名`"fips"`

``((?P<state>[A-Z ]*\$)|(?P<county>.*?), (?P<state_code>[A-Z]{2}\$))``

`|`）做以下两件事之一：

``(?P<state>[A-Z ]*\$)``
• 匹配任意数量（`*`）的大写字母或空格（`[A-Z ]`），并`"state"`在字符串（`\$`）末尾之前对其进行命名，

``(?P<county>.*?), (?P<state_code>[A-Z]{2}\$))``
• `.*`然后匹配其他任何（）
• 然后用逗号和空格
• 匹配`state_code`字符串（`\$`）末尾的两位数字。

You can extract the different parts out quite neatly using a regex pattern:

To explain the somewhat long regex:

``````(?P<fips>\d{5})
``````
• Matches the five digits (`\d`) and names them `"fips"`.

The next part:

``````((?P<state>[A-Z ]*\$)|(?P<county>.*?), (?P<state_code>[A-Z]{2}\$))
``````

Does either (`|`) one of two things:

``````(?P<state>[A-Z ]*\$)
``````
• Matches any number (`*`) of capital letters or spaces (`[A-Z ]`) and names this `"state"` before the end of the string (`\$`),

or

``````(?P<county>.*?), (?P<state_code>[A-Z]{2}\$))
``````
• matches anything else (`.*`) then
• a comma and a space then
• matches the two digit `state_code` before the end of the string (`\$`).

In the example:
Note that the first two rows hit the “state” (leaving NaN in the county and state_code columns), whilst the last three hit the county, state_code (leaving NaN in the state column).

## 回答 3

``df[['fips', 'row']] = df['row'].str.split(' ', n=1, expand=True)``
``````df[['fips', 'row']] = df['row'].str.split(' ', n=1, expand=True)
``````

## 回答 4

``````df["flips"], df["row_name"] = zip(*df["row"].str.split().tolist())
del df["row"]  ``````

If you don’t want to create a new dataframe, or if your dataframe has more columns than just the ones you want to split, you could:

``````df["flips"], df["row_name"] = zip(*df["row"].str.split().tolist())
del df["row"]
``````

## 回答 5

ValueError：列的长度必须与键的长度相同

You can use by whitespace (default separator) and parameter `expand=True` for `DataFrame` with assign to new columns:

Modification if need remove original column with `DataFrame.pop`

What is same like:

If get error:

ValueError: Columns must be same length as key

You can check and it return 4 column `DataFrame`, not only 2:

Then solution is append new `DataFrame` by `join`:

With remove original column (if there are also another columns):

## 回答 6

``df['column_name'].str.split('/', expand=True)``

If you want to split a string into more than two columns based on a delimiter you can omit the ‘maximum splits’ parameter.
You can use:

``````df['column_name'].str.split('/', expand=True)
``````

This will automatically create as many columns as the maximum number of fields included in any of your initial strings.

# `Series.str.partition`

`partition` 在分隔符上执行一次拆分，通常表现出色。

``df.join(df['row'].str.partition(' ')[[0, 2]])``

Surprised I haven’t seen this one yet. If you only need two splits, I highly recommend. . .

# `Series.str.partition`

`partition` performs one split on the separator, and is generally quite performant.

If you need to rename the rows,

If you need to join this back to the original, use `join` or `concat`:

``````df.join(df['row'].str.partition(' ')[[0, 2]])
``````

## 回答 8

I prefer exporting the corresponding pandas series (i.e. the columns I need), using the apply function to split the column content into multiple series and then join the generated columns to the existing DataFrame. Of course, the source column should be removed.

e.g.

To split two words strings function should be something like that:

## 回答 9

I saw that no one had used the slice method, so here I put my 2 cents here.

``````df["<col_name>"].str.slice(stop=5)
df["<col_name>"].str.slice(start=6)
``````

This method will create two new columns.

## 回答 10

Use `df.assign` to create a new df. See http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

