カラム選択（Column selections）

このセクションで使用するデータセットを作成しましょう：

Python Rust

DataFrame

from datetime import date, datetime

import polars as pl

df = pl.DataFrame(
    {
        "id": [9, 4, 2],
        "place": ["Mars", "Earth", "Saturn"],
        "date": pl.date_range(date(2022, 1, 1), date(2022, 1, 3), "1d", eager=True),
        "sales": [33.4, 2142134.1, 44.7],
        "has_people": [False, True, False],
        "logged_at": pl.datetime_range(
            datetime(2022, 12, 1), datetime(2022, 12, 1, 0, 0, 2), "1s", eager=True
        ),
    }
).with_row_index("index")
print(df)

DataFrame

    use chrono::prelude::*;
    use polars::time::*;

    let df = df!(
            "id" => &[9, 4, 2],
            "place" => &["Mars", "Earth", "Saturn"],
        "date" => date_range("date",
                NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(), Duration::parse("1d"),ClosedWindow::Both, TimeUnit::Milliseconds, None)?,
            "sales" => &[33.4, 2142134.1, 44.7],
            "has_people" => &[false, true, false],
            "logged_at" => date_range("logged_at",
                NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 2).unwrap(), Duration::parse("1s"),ClosedWindow::Both, TimeUnit::Milliseconds, None)?,
    )?
    .with_row_index("index", None)?;
    println!("{}", &df);

shape: (3, 7)
┌───────┬─────┬────────┬────────────┬───────────┬────────────┬─────────────────────┐
│ index ┆ id  ┆ place  ┆ date       ┆ sales     ┆ has_people ┆ logged_at           │
│ ---   ┆ --- ┆ ---    ┆ ---        ┆ ---       ┆ ---        ┆ ---                 │
│ u32   ┆ i64 ┆ str    ┆ date       ┆ f64       ┆ bool       ┆ datetime[μs]        │
╞═══════╪═════╪════════╪════════════╪═══════════╪════════════╪═════════════════════╡
│ 0     ┆ 9   ┆ Mars   ┆ 2022-01-01 ┆ 33.4      ┆ false      ┆ 2022-12-01 00:00:00 │
│ 1     ┆ 4   ┆ Earth  ┆ 2022-01-02 ┆ 2142134.1 ┆ true       ┆ 2022-12-01 00:00:01 │
│ 2     ┆ 2   ┆ Saturn ┆ 2022-01-03 ┆ 44.7      ┆ false      ┆ 2022-12-01 00:00:02 │
└───────┴─────┴────────┴────────────┴───────────┴────────────┴─────────────────────┘

エクスプレッションの拡張

前のセクションで見たように、pl.col メソッドを使用して特定のカラムを選択できます。これは複数のカラムを選択する便利な手段としても、エクスプレッションを拡張する方法としても使用できます。

このような便利な機能は単なる装飾やシンタックスシュガーではありません。コード内で DRY 原則を非常に強力に適用することを可能にします：一つのエクスプレッションが複数のカラムを指定し、DataFrame スキーマに応じてエクスプレッションのリストに拡張され、複数のカラムを選択して計算を実行できます！

全部を選択、あるいは一部を除外

DataFrame オブジェクトのすべてのカラムを * 引数を提供することで選択できます：

Python Rust

all

out = df.select(pl.col("*"))

# Is equivalent to
out = df.select(pl.all())
print(out)

all

let out = df.clone().lazy().select([col("*")]).collect()?;
println!("{}", &out);

// Is equivalent to
let out = df.clone().lazy().select([all()]).collect()?;
println!("{}", &out);

```python exec="on" result="text" session="user-guide/column-selections out = df.select(pl.col("*"))

Is equivalent to

out = df.select(pl.all()) print(out)

しばしば、すべてのカラムを含めたいだけでなく、いくつかを除外して含めたいと考えます。これも簡単に行えます：

=== ":fontawesome-brands-python: Python"
    [:material-api:  `exclude`](https://docs.pola.rs/py-polars/html/reference/expressions/api/polars.exclude.html)
    ```python
    out = df.select(pl.col("*").exclude("logged_at", "index"))
    print(out)
    ```

=== ":fontawesome-brands-rust: Rust"
    [:material-api:  `exclude`](https://docs.pola.rs/docs/rust/dev/polars_lazy/dsl/enum.Expr.html#method.exclude)
    ```rust
    let out = df
        .clone()
        .lazy()
        .select([col("*").exclude(["logged_at", "index"])])
        .collect()?;
    println!("{}", &out);
    ```


```python exec="on" result="text" session="user-guide/column-selections"
out = df.select(pl.col("*").exclude("logged_at", "index"))
print(out)

複数の文字列による

複数の文字列を指定することで、エクスプレッションが一致するすべてのカラムに拡張されます：

Python Rust

dt.to_string

out = df.select(pl.col("date", "logged_at").dt.to_string("%Y-%h-%d"))
print(out)

dt.to_string · Available on feature temporal

let out = df
    .clone()
    .lazy()
    .select([cols(["date", "logged_at"]).dt().to_string("%Y-%h-%d")])
    .collect()?;
println!("{}", &out);

shape: (3, 2)
┌─────────────┬─────────────┐
│ date        ┆ logged_at   │
│ ---         ┆ ---         │
│ str         ┆ str         │
╞═════════════╪═════════════╡
│ 2022-Jan-01 ┆ 2022-Dec-01 │
│ 2022-Jan-02 ┆ 2022-Dec-01 │
│ 2022-Jan-03 ┆ 2022-Dec-01 │
└─────────────┴─────────────┘

正規表現による

正規表現も使用して複数のカラム選択が可能です。pl.col が正規表現選択を期待していることを知らせるために、正規表現を ^ と $ で囲むことが重要です：

Python Rust

out = df.select(pl.col("^.*(as|sa).*$"))
print(out)

let out = df.clone().lazy().select([col("^.*(as|sa).*$")]).collect()?;
println!("{}", &out);

shape: (3, 2)
┌───────────┬────────────┐
│ sales     ┆ has_people │
│ ---       ┆ ---        │
│ f64       ┆ bool       │
╞═══════════╪════════════╡
│ 33.4      ┆ false      │
│ 2142134.1 ┆ true       │
│ 44.7      ┆ false      │
└───────────┴────────────┘

データタイプによる

pl.col は Polars のデータタイプを使用して複数のカラムを選択できます：

Python Rust

n_unique

out = df.select(pl.col(pl.Int64, pl.UInt32, pl.Boolean).n_unique())
print(out)

n_unique

let out = df
    .clone()
    .lazy()
    .select([dtype_cols([DataType::Int64, DataType::UInt32, DataType::Boolean]).n_unique()])
    .collect()?;
// gives different result than python as the id col is i32 in rust
println!("{}", &out);

shape: (1, 3)
┌───────┬─────┬────────────┐
│ index ┆ id  ┆ has_people │
│ ---   ┆ --- ┆ ---        │
│ u32   ┆ u32 ┆ u32        │
╞═══════╪═════╪════════════╡
│ 3     ┆ 3   ┆ 2          │
└───────┴─────┴────────────┘

`selectors` の使用

Polars は、カラムの名前、dtype、またはその他のプロパティに基づいた直感的なカラム選択も可能で、これは上述の col を使用した機能に基づいています。polars.selectors を cs としてインポートしてエイリアスを設定することを推奨します。

`dtype` による

ただの整数と文字列のカラムを選択するには：

Python Rust

selectors

import polars.selectors as cs

out = df.select(cs.integer(), cs.string())
print(out)

selectors

// Not available in Rust, refer the following link
// https://github.com/pola-rs/polars/issues/10594

shape: (3, 3)
┌───────┬─────┬────────┐
│ index ┆ id  ┆ place  │
│ ---   ┆ --- ┆ ---    │
│ u32   ┆ i64 ┆ str    │
╞═══════╪═════╪════════╡
│ 0     ┆ 9   ┆ Mars   │
│ 1     ┆ 4   ┆ Earth  │
│ 2     ┆ 2   ┆ Saturn │
└───────┴─────┴────────┘

集合操作を適用する

これらの selectors は集合に基づいた選択操作も許可します。例えば、行番号を示す 最初の カラムを除く数値カラムを選択するには：

Python Rust

cs.first · cs.numeric

out = df.select(cs.numeric() - cs.first())
print(out)

cs.first · cs.numeric

// Not available in Rust, refer the following link
// https://github.com/pola-rs/polars/issues/10594

shape: (3, 2)
┌─────┬───────────┐
│ id  ┆ sales     │
│ --- ┆ ---       │
│ i64 ┆ f64       │
╞═════╪═══════════╡
│ 9   ┆ 33.4      │
│ 4   ┆ 2142134.1 │
│ 2   ┆ 44.7      │
└─────┴───────────┘

行番号を名前で、そして任意の非数値カラムも選択できます：

Python Rust

cs.by_name · cs.numeric

out = df.select(cs.by_name("index") | ~cs.numeric())
print(out)

cs.by_name · cs.numeric

// Not available in Rust, refer the following link
// https://github.com/pola-rs/polars/issues/10594

shape: (3, 5)
┌───────┬────────┬────────────┬────────────┬─────────────────────┐
│ index ┆ place  ┆ date       ┆ has_people ┆ logged_at           │
│ ---   ┆ ---    ┆ ---        ┆ ---        ┆ ---                 │
│ u32   ┆ str    ┆ date       ┆ bool       ┆ datetime[μs]        │
╞═══════╪════════╪════════════╪════════════╪═════════════════════╡
│ 0     ┆ Mars   ┆ 2022-01-01 ┆ false      ┆ 2022-12-01 00:00:00 │
│ 1     ┆ Earth  ┆ 2022-01-02 ┆ true       ┆ 2022-12-01 00:00:01 │
│ 2     ┆ Saturn ┆ 2022-01-03 ┆ false      ┆ 2022-12-01 00:00:02 │
└───────┴────────┴────────────┴────────────┴─────────────────────┘

パターンと部分文字列による

Selectors は部分文字列と正規表現パターンにもマッチ可能です：

Python Rust

cs.contains · cs.matches

out = df.select(cs.contains("index"), cs.matches(".*_.*"))
print(out)

cs.contains · cs.matches

// Not available in Rust, refer the following link
// https://github.com/pola-rs/polars/issues/1059

shape: (3, 3)
┌───────┬────────────┬─────────────────────┐
│ index ┆ has_people ┆ logged_at           │
│ ---   ┆ ---        ┆ ---                 │
│ u32   ┆ bool       ┆ datetime[μs]        │
╞═══════╪════════════╪═════════════════════╡
│ 0     ┆ false      ┆ 2022-12-01 00:00:00 │
│ 1     ┆ true       ┆ 2022-12-01 00:00:01 │
│ 2     ┆ false      ┆ 2022-12-01 00:00:02 │
└───────┴────────────┴─────────────────────┘

エクスプレッションへの変換

選択されたカラムに特定の操作を適用したい場合（つまり、通常のようにそれらを エクスプレッション として表現して操作を進めたい場合）、単に as_expr を使用して変換し、通常どおり進めることができます：

Python Rust

cs.temporal

out = df.select(cs.temporal().as_expr().dt.to_string("%Y-%h-%d"))
print(out)

cs.temporal

// Not available in Rust, refer the following link
// https://github.com/pola-rs/polars/issues/10594

shape: (3, 2)
┌─────────────┬─────────────┐
│ date        ┆ logged_at   │
│ ---         ┆ ---         │
│ str         ┆ str         │
╞═════════════╪═════════════╡
│ 2022-Jan-01 ┆ 2022-Dec-01 │
│ 2022-Jan-02 ┆ 2022-Dec-01 │
│ 2022-Jan-03 ┆ 2022-Dec-01 │
└─────────────┴─────────────┘

`selectors` のデバッグ

Polars は、selectors の使用を支援するために役立つ2つのユーティリティ関数を提供します：is_selector と expand_selector：

Python Rust

is_selector

from polars.selectors import is_selector

out = cs.temporal()
print(is_selector(out))

is_selector

// Not available in Rust, refer the following link
// https://github.com/pola-rs/polars/issues/10594

True

特に LazyFrame オブジェクトの場合、どのカラム名が選択されるかを事前に知ることが特に有用です：

Python Rust

expand_selector

from polars.selectors import expand_selector

out = cs.temporal().as_expr().dt.to_string("%Y-%h-%d")
print(expand_selector(df, out))

expand_selector

// Not available in Rust, refer the following link
// https://github.com/pola-rs/polars/issues/10594

('date', 'logged_at')

カラム選択（Column selections）

エクスプレッションの拡張

全部を選択、あるいは一部を除外

Is equivalent to

複数の文字列による

正規表現による

データタイプによる

selectors の使用

dtype による

集合操作を適用する

パターンと部分文字列による

エクスプレッションへの変換

selectors のデバッグ

`selectors` の使用

`dtype` による

`selectors` のデバッグ