Migration guide¶
See the release notes on GitHub for comprehensive information about the content of each Kedro release.
Migrate an existing project that uses Kedro 0.19.* to use 1.*¶
Using Kedro as a framework¶
If you're using Kedro as a framework, you need to update your Kedro project to use Kedro 1.0.0. To do this, you need to follow these steps:
- Update your project's
kedro_init_versioninpyproject.tomlto1.0.0:
[tool.kedro]
package_name = "my_project"
project_name = "my-project"
- kedro_init_version = "0.19.14"
+ kedro_init_version = "1.0.0"
- Update your
src/pipelines/<pipeline_name>/pipeline.pyfile to useNode()andPipeline()to initialise your nodes and pipelines. While the wrapper functionsnode()andpipeline()still work,Node()andPipeline()is the preferred way to create nodes and pipelines in Kedro 1.0.0. If thepipeline()function is used, make sure it is imported fromkedro.pipelineinstead ofkedro.modular_pipeline.
- from kedro.pipeline.modular_pipeline import node, pipeline # Old import
+ from kedro.pipeline import Node, Pipeline # New import
from .nodes import create_model_input_table, preprocess_companies, preprocess_shuttles
def create_pipeline(**kwargs) -> Pipeline:
- return pipeline(
+ return Pipeline(
[
- node(
+ Node(
func=preprocess_companies,
inputs="companies",
outputs="preprocessed_companies",
name="preprocess_companies_node",
),
- node(
+ Node(
func=preprocess_shuttles,
inputs="shuttles",
outputs="preprocessed_shuttles",
name="preprocess_shuttles_node",
),
- node(
+ Node(
func=create_model_input_table,
inputs=["preprocessed_shuttles", "preprocessed_companies", "reviews"],
outputs="model_input_table",
name="create_model_input_table_node",
),
]
)
pipeline() function, make sure to rename the first argument from pipe to nodes to be consistent with the argument names of the Pipeline class.
- pipeline(pipe=[node1, node2])
+ pipeline(nodes=[node1, node2])
--namespaceargument inkedro runcommand was removed in favour of--namespaceswhich accepts multiple namespaces. If you used the--namespaceargument, you need to change it to--namespacesand pass a comma-separated list of namespaces. For example, if you used this command:You should now use the following:kedro run --namespace=preprocessing
kedro run --namespaces=preprocessing
kedro catalog create command was removed in Kedro 1.0.0.
- If you were using the experimental KedroDataCatalog class, it was renamed to DataCatalog in Kedro 1.0.0. You would need to remove the following lines from your settings.py file:
- from kedro.io import KedroDataCatalog
- DATA_CATALOG_CLASS = KedroDataCatalog
Using Kedro as a library¶
If you're using Kedro as a library, you might need to make the following changes to your workflow:
- Rename the
extra_paramsargument toruntime_paramsinKedroSession:
with KedroSession.create(
project_path=project_path,
- extra_params={"param1": "value1", "param2": "value2"},
+ runtime_params={"param1": "value1", "param2": "value2"},
) as session:
session.run()
DataCatalog methods and CLI commands have been removed in Kedro version 1.0.
Please update your code and workflows accordingly. Where possible, recommended alternatives are provided.
| Deprecated Item | Type | Replacement / Notes |
|---|---|---|
catalog._get_dataset() |
Method | Internal use only; use catalog.get() instead |
catalog.add_all() |
Method | Prefer explicit catalog construction or use catalog.add() |
catalog.add_feed_dict() |
Method | Use catalog["my_dataset"] = ... (dict-style assignment) |
catalog.list() |
Method | Replaced by catalog.filter() |
catalog.shallow_copy() |
Method | Removed; no longer needed after internal refactor |
Other API changes¶
The following API changes might be relevant to advanced users of Kedro or plugin developers:
- Kedro 1.0.0 made the following private methods
_is_projectand_find_kedro_projectpublic. To update, you need to useis_kedro_projectandfind_kedro_projectrespectively. - Renamed instances of
extra_paramsand_extra_paramstoruntime_paramsinKedroSession,KedroContextandPipelineSpecs. To update, start usingruntime_paramswhile creating aKedroSession,KedroContextor while using pipeline hooks likebefore_pipeline_run,after_pipeline_runandon_pipeline_error. - Removed the
modular_pipelinemodule and moved functionality to thepipelinemodule instead. Change any imports to usekedro.pipelineinstead ofkedro.modular_pipeline. - Renamed the first argument to the
pipeline()function frompipetonodesto be consistent with the argument names of thePipelineclass. - Renamed
ModularPipelineErrortoPipelineError. - The
session_idparameter has been renamed torun_idin all runner methods and hooks.
Migrate an existing project that uses Kedro 0.18.* to use 0.19.*¶
Custom syntax for --params was removed¶
Kedro 0.19.0 removed the custom Kedro syntax for --params. To update, you need to use the OmegaConf syntax instead by replacing : with =.
If you used this command to pass parameters to kedro run:
kedro run --params=param_key1:value1,param_key2:2.0
kedro run --params=param_key1=value1,param_key2=2.0
For more information see "How to specify parameters at runtime".
create_default_data_set() was removed from Runner¶
Kedro 0.19 removed the create_default_data_set() method in the Runner. To overwrite the default dataset creation, you need to use the new Runner class argument extra_dataset_patterns instead.
On class instantiation, pass the extra_dataset_patterns argument, and overwrite the default MemoryDataset creation as follows:
from kedro.runner import ThreadRunner
runner = ThreadRunner(extra_dataset_patterns={"{default}": {"type": "MyCustomDataset"}})
project_version was removed¶
Kedro 0.19 removed project_version in pyproject.toml. Use kedro_init_version instead:
[tool.kedro]
package_name = "my_project"
project_name = "my project"
- project_version = "0.19.1"
+ kedro_init_version = "0.19.1"
Datasets changes in 0.19¶
The layer attribute in catalog.yml has moved¶
From 0.19, the layer attribute at the top level has been moved inside the metadata -> kedro-viz attribute. You need to update catalog.yml accordingly.
The following catalog.yml entry changes from the following in 0.18.x code:
companies:
type: pandas.CSVDataSet
filepath: data/01_raw/companies.csv
layer: raw
to this in 0.19.x:
companies:
type: pandas.CSVDataset
filepath: data/01_raw/companies.csv
metadata:
kedro-viz:
layer: raw
See the Kedro-Viz documentation for more information
For APIDataset, the requests-specific arguments in catalog.yml have moved¶
From 0.19, if you use APIDataset, you need to move all requests-specific arguments, such as params, headers, in the hierarchy to sit under load_args. The url and method arguments are not affected.
For example the following APIDataset in catalog.yml changes from the following in 0.18.x code:
us_corn_yield_data:
type: api.APIDataSet
url: https://quickstats.nass.usda.gov
credentials: usda_credentials
params:
key: SOME_TOKEN
format: JSON
to this in 0.19.x:
us_corn_yield_data:
type: api.APIDataSet
url: https://quickstats.nass.usda.gov
credentials: usda_credentials
load_args:
params:
key: SOME_TOKEN
format: JSON
Dataset renaming¶
In 0.19.0 we renamed dataset and error classes to follow the Kedro lexicon.
- Dataset classes ending with
DataSetare replaced by classes that end withDataset. - Error classes starting with
DataSetare replaced by classes that start withDataset.
All the classes below are also importable from kedro.io; only the module where they are defined is listed as the location.
| Type | Removed Alias | Location |
|---|---|---|
AbstractDataset |
AbstractDataSet |
kedro.io.core |
AbstractVersionedDataset |
AbstractVersionedDataSet |
kedro.io.core |
CachedDataset |
CachedDataSet |
kedro.io.cached_dataset |
LambdaDataset |
LambdaDataSet |
kedro.io.lambda_dataset |
MemoryDataset |
MemoryDataSet |
kedro.io.memory_dataset |
DatasetError |
DataSetError |
kedro.io.core |
DatasetAlreadyExistsError |
DataSetAlreadyExistsError |
kedro.io.core |
DatasetNotFoundError |
DataSetNotFoundError |
kedro.io.core |
All other dataset classes are removed from the core Kedro repository (kedro.extras.datasets)¶
You now need to install and import datasets from the kedro-datasets package instead.
Configuration changes in 0.19¶
The ConfigLoader and TemplatedConfigLoader classes were deprecated in Kedro 0.18.12 and were removed in Kedro 0.19.0. To use that release or later, you must now adopt the OmegaConfigLoader. The configuration migration guide outlines the primary distinctions between the old loaders and the OmegaConfigLoader, and provides step-by-step instructions on updating your code base to use the new class effectively.
Changes to the default environments¶
The default configuration environment has changed in 0.19 and needs to be declared in settings.py explicitly if you have custom arguments. For example, if you use CONFIG_LOADER_ARGS in settings.py to read Spark configuration, you need to add base_env and default_run_env explicitly.
Before 0.19.x:
CONFIG_LOADER_ARGS = {
# "base_env": "base",
# "default_run_env": "local",
"config_patterns": {
"spark": ["spark*", "spark*/**"],
}
}
In 0.19.x:
CONFIG_LOADER_ARGS = {
"base_env": "base",
"default_run_env": "local",
"config_patterns": {
"spark": ["spark*", "spark*/**"],
}
}
If you didn't use CONFIG_LOADER_ARGS in your code, this change is not needed because Kedro sets it by default.
Logging¶
logging.yml is now independent of Kedro's run environment and used only if KEDRO_LOGGING_CONFIG is set to point to it. The documentation on logging describes in detail how logging works in Kedro and how it can be customised.