06 Skills Refactor
Overloaded Operators:¶
Modify and refine skills using operator overloading.
- Combining Skills: Utilize the
+
operator to chain or execute skills in parallel, detailing the coordination with the>
operator.
new_skill = skillA + skillB > "Explanation of how skills A and B operate together"
Refactoring Skills: Employ the
>
operator to enhance or modify existing skills.refactored_skill = skill > "Descriptive alterations or enhancements"
Decomposing Skills: Use the
<
operator to break down a skill into simpler components.simpler_skills = skill < "Description of how the skill should be decomposed"
Notes:¶
- Ensure accurate descriptions when using overloaded operators to ensure skill modifications are clear and understandable.
- Validate skills with
test
method to ensure functionality post-modification.
In [1]:
Copied!
from creator import create
from creator.core.skill import CodeSkill, BaseSkillMetadata
from creator import create
from creator.core.skill import CodeSkill, BaseSkillMetadata
1 Refactor Case: Finetune a Skill Object¶
- add input parameter
- add output
- change logic
In [2]:
Copied!
skillA_schema = {
"skill_name": "data_cleaning",
"skill_description": "Cleans input data by converting string representations of 'null' and 'NaN' to actual null values, and then removing null values and duplicates.",
"skill_tags": ["data_processing", "cleaning", "null_removal", "duplicate_removal"],
"skill_usage_example": "data_cleaning(input_data, remove_duplicates=True)",
"skill_program_language": "python",
"skill_code": """
def data_cleaning(data, remove_duplicates=True):
# Convert string representations of null to actual null values
data = data.replace({'null': None, 'NaN': None})
# Remove null values
data = data.dropna()
# Remove duplicates if specified
if remove_duplicates:
data = data.drop_duplicates()
return data
""",
"skill_parameters": [
{
"param_name": "data",
"param_type": "array",
"param_description": "The input dataset that needs cleaning.",
"param_required": True
},
{
"param_name": "remove_duplicates",
"param_type": "boolean",
"param_description": "Flag to determine if duplicates should be removed. Defaults to True.",
"param_required": False,
"param_default": True
}
],
"skill_return": {
"param_name": "cleaned_data",
"param_type": "array",
"param_description": "The cleaned dataset with string 'null'/'NaN' values converted to actual nulls, and nulls and duplicates removed based on specified parameters."
},
"skill_dependencies": [
{
"dependency_name": "pandas",
"dependency_version": "1.2.0",
"dependency_type": "package"
}
]
}
skillA = CodeSkill(**skillA_schema)
skillA.skill_metadata = BaseSkillMetadata()
skillA_schema = {
"skill_name": "data_cleaning",
"skill_description": "Cleans input data by converting string representations of 'null' and 'NaN' to actual null values, and then removing null values and duplicates.",
"skill_tags": ["data_processing", "cleaning", "null_removal", "duplicate_removal"],
"skill_usage_example": "data_cleaning(input_data, remove_duplicates=True)",
"skill_program_language": "python",
"skill_code": """
def data_cleaning(data, remove_duplicates=True):
# Convert string representations of null to actual null values
data = data.replace({'null': None, 'NaN': None})
# Remove null values
data = data.dropna()
# Remove duplicates if specified
if remove_duplicates:
data = data.drop_duplicates()
return data
""",
"skill_parameters": [
{
"param_name": "data",
"param_type": "array",
"param_description": "The input dataset that needs cleaning.",
"param_required": True
},
{
"param_name": "remove_duplicates",
"param_type": "boolean",
"param_description": "Flag to determine if duplicates should be removed. Defaults to True.",
"param_required": False,
"param_default": True
}
],
"skill_return": {
"param_name": "cleaned_data",
"param_type": "array",
"param_description": "The cleaned dataset with string 'null'/'NaN' values converted to actual nulls, and nulls and duplicates removed based on specified parameters."
},
"skill_dependencies": [
{
"dependency_name": "pandas",
"dependency_version": "1.2.0",
"dependency_type": "package"
}
]
}
skillA = CodeSkill(**skillA_schema)
skillA.skill_metadata = BaseSkillMetadata()
In [3]:
Copied!
skillA.show()
skillA.show()
Skill Details: • Name: data_cleaning • Description: Cleans input data by converting string representations of 'null' and 'NaN' to actual null values, and then removing null values and duplicates. • Version: 1.0.0 • Usage: data_cleaning(input_data, remove_duplicates=True) • Parameters: • data (array): The input dataset that needs cleaning. • Required: True • remove_duplicates (boolean): Flag to determine if duplicates should be removed. Defaults to True. • Default: True • Returns: • cleaned_data (array): The cleaned dataset with string 'null'/'NaN' values converted to actual nulls, and nulls and duplicates removed based on specified parameters.
In [4]:
Copied!
add_param_skillA = skillA > "add a parameter that allows the choice of whether to remove duplicate values"
add_param_skillA = skillA > "add a parameter that allows the choice of whether to remove duplicate values"
In [5]:
Copied!
add_param_skillA.show()
add_param_skillA.show()
Skill Details: • Name: data_cleaning • Description: Cleans input data by converting string representations of 'null' and 'NaN' to actual null values, and then removing null values and duplicates based on specified parameters. • Version: 1.0.0 • Usage: data_cleaning(input_data, remove_nulls=True, remove_duplicates=True) • Parameters: • data (array): The input dataset that needs cleaning. • Required: True • remove_nulls (boolean): Flag to determine if null values should be removed. Defaults to True. • Required: True • Default: True • remove_duplicates (boolean): Flag to determine if duplicates should be removed. Defaults to True. • Required: True • Default: True • Returns: • cleaned_data (array): The cleaned dataset with string 'null'/'NaN' values converted to actual nulls, and nulls and duplicates removed based on specified parameters.
In [6]:
Copied!
add_output_skillA = skillA > "Not only get cleaned data, but also want to get statistics on deleted null and duplicate values"
add_output_skillA = skillA > "Not only get cleaned data, but also want to get statistics on deleted null and duplicate values"
In [7]:
Copied!
add_output_skillA.show()
add_output_skillA.show()
Skill Details: • Name: data_cleaning_with_stats • Description: Cleans input data by converting string representations of 'null' and 'NaN' to actual null values, and then removing null values and duplicates. Additionally, it provides statistics on the number of null and duplicate values removed. • Version: 1.0.0 • Usage: cleaned_data, stats = data_cleaning_with_stats(input_data, remove_duplicates=True) • Parameters: • data (array): The input dataset that needs cleaning. • Required: True • remove_duplicates (boolean): Flag to determine if duplicates should be removed. Defaults to True. • Required: True • Default: True • Returns: • cleaned_data (array): The cleaned dataset with string 'null'/'NaN' values converted to actual nulls, and nulls and duplicates removed based on specified parameters. • stats (dictionary): A dictionary containing the count of null and duplicate values removed from the dataset.
In [8]:
Copied!
change_logic_skillA = skillA > 'Convert all "null" or "NaN" of string type to true null before removing nulls'
change_logic_skillA = skillA > 'Convert all "null" or "NaN" of string type to true null before removing nulls'
In [9]:
Copied!
change_logic_skillA.show()
change_logic_skillA.show()
Skill Details: • Name: data_cleaning • Description: Cleans input data by converting string representations of 'null' and 'NaN' to actual null values, and then removing null values and duplicates. • Version: 1.0.0 • Usage: data_cleaning(input_data, remove_duplicates=True) • Parameters: • data (array): The input dataset that needs cleaning. • Required: True • remove_duplicates (boolean): Flag to determine if duplicates should be removed. Defaults to True. • Required: True • Default: True • Returns: • cleaned_data (array): The cleaned dataset with string 'null'/'NaN' values converted to actual nulls, and nulls and duplicates removed based on specified parameters.
2 Refactor Case 2: combine skills into one¶
- chain skills
- internal logic combine
- parallel combine
In [10]:
Copied!
def create_combine_testcase():
skillA_json = {
"skill_name": "data_cleaning",
"skill_description": "This skill is responsible for cleaning the input data by removing empty values. It provides a simple way to preprocess data and make it ready for further analysis or visualization.",
"skill_tags": ["cleaning", "preprocessing", "data"],
"skill_usage_example": "data_cleaning(input_data)",
"skill_program_language": "python",
"skill_code": """
def data_cleaning(data):
\"\"\"Clean the data by removing empty values.\"\"\"
return [item for item in data if item is not None]
""",
"skill_parameters": [
{
"param_name": "data",
"param_type": "array",
"param_description": "The input data that needs cleaning. It should be a list of values.",
"param_required": True
}
],
"skill_return": {
"param_name": "cleaned_data",
"param_type": "array",
"param_description": "The cleaned data after removing empty values."
},
"skill_dependencies": None
}
skillA = CodeSkill(**skillA_json)
skillA.skill_metadata = BaseSkillMetadata()
skillB_json = {
"skill_name": "data_visualization",
"skill_description": "This skill is responsible for visualizing the input data by generating a bar chart. It helps in understanding the data distribution and patterns.",
"skill_tags": ["visualization", "chart", "data"],
"skill_usage_example": "data_visualization(input_data)",
"skill_program_language": "python",
"skill_code": """
import matplotlib.pyplot as plt
def data_visualization(data):
\"\"\"Visualize the data using a bar chart.\"\"\"
plt.bar(range(len(data)), data)
plt.show()
""",
"skill_parameters": [
{
"param_name": "data",
"param_type": "array",
"param_description": "The input data that needs to be visualized. It should be a list of values.",
"param_required": True
}
],
"skill_return": None,
"skill_dependencies": [
{
"dependency_name": "matplotlib",
"dependency_version": "3.4.3",
"dependency_type": "package"
}
]
}
skillB = CodeSkill(**skillB_json)
skillB.skill_metadata = BaseSkillMetadata()
skillC_json = {
"skill_name": "data_statistics",
"skill_description": "This skill calculates the average value of the input data. It provides a basic statistical overview of the dataset.",
"skill_tags": ["statistics", "average", "data"],
"skill_usage_example": "data_statistics(input_data)",
"skill_program_language": "python",
"skill_code": """
def data_statistics(data):
\"\"\"Calculate the average of the data.\"\"\"
return sum(data) / len(data)
""",
"skill_parameters": [
{
"param_name": "data",
"param_type": "array",
"param_description": "The input data for which the average needs to be calculated. It should be a list of numerical values.",
"param_required": True
}
],
"skill_return": {
"param_name": "average",
"param_type": "float",
"param_description": "The average value of the input data."
},
"skill_dependencies": None
}
skillC = CodeSkill(**skillC_json)
skillC.skill_metadata = BaseSkillMetadata()
return skillA, skillB, skillC
def create_combine_testcase():
skillA_json = {
"skill_name": "data_cleaning",
"skill_description": "This skill is responsible for cleaning the input data by removing empty values. It provides a simple way to preprocess data and make it ready for further analysis or visualization.",
"skill_tags": ["cleaning", "preprocessing", "data"],
"skill_usage_example": "data_cleaning(input_data)",
"skill_program_language": "python",
"skill_code": """
def data_cleaning(data):
\"\"\"Clean the data by removing empty values.\"\"\"
return [item for item in data if item is not None]
""",
"skill_parameters": [
{
"param_name": "data",
"param_type": "array",
"param_description": "The input data that needs cleaning. It should be a list of values.",
"param_required": True
}
],
"skill_return": {
"param_name": "cleaned_data",
"param_type": "array",
"param_description": "The cleaned data after removing empty values."
},
"skill_dependencies": None
}
skillA = CodeSkill(**skillA_json)
skillA.skill_metadata = BaseSkillMetadata()
skillB_json = {
"skill_name": "data_visualization",
"skill_description": "This skill is responsible for visualizing the input data by generating a bar chart. It helps in understanding the data distribution and patterns.",
"skill_tags": ["visualization", "chart", "data"],
"skill_usage_example": "data_visualization(input_data)",
"skill_program_language": "python",
"skill_code": """
import matplotlib.pyplot as plt
def data_visualization(data):
\"\"\"Visualize the data using a bar chart.\"\"\"
plt.bar(range(len(data)), data)
plt.show()
""",
"skill_parameters": [
{
"param_name": "data",
"param_type": "array",
"param_description": "The input data that needs to be visualized. It should be a list of values.",
"param_required": True
}
],
"skill_return": None,
"skill_dependencies": [
{
"dependency_name": "matplotlib",
"dependency_version": "3.4.3",
"dependency_type": "package"
}
]
}
skillB = CodeSkill(**skillB_json)
skillB.skill_metadata = BaseSkillMetadata()
skillC_json = {
"skill_name": "data_statistics",
"skill_description": "This skill calculates the average value of the input data. It provides a basic statistical overview of the dataset.",
"skill_tags": ["statistics", "average", "data"],
"skill_usage_example": "data_statistics(input_data)",
"skill_program_language": "python",
"skill_code": """
def data_statistics(data):
\"\"\"Calculate the average of the data.\"\"\"
return sum(data) / len(data)
""",
"skill_parameters": [
{
"param_name": "data",
"param_type": "array",
"param_description": "The input data for which the average needs to be calculated. It should be a list of numerical values.",
"param_required": True
}
],
"skill_return": {
"param_name": "average",
"param_type": "float",
"param_description": "The average value of the input data."
},
"skill_dependencies": None
}
skillC = CodeSkill(**skillC_json)
skillC.skill_metadata = BaseSkillMetadata()
return skillA, skillB, skillC
In [11]:
Copied!
skillA, skillB, skillC = create_combine_testcase()
skillA, skillB, skillC = create_combine_testcase()
In [12]:
Copied!
skillA.show()
skillA.show()
Skill Details: • Name: data_cleaning • Description: This skill is responsible for cleaning the input data by removing empty values. It provides a simple way to preprocess data and make it ready for further analysis or visualization. • Version: 1.0.0 • Usage: data_cleaning(input_data) • Parameters: • data (array): The input data that needs cleaning. It should be a list of values. • Required: True • Returns: • cleaned_data (array): The cleaned data after removing empty values.
In [13]:
Copied!
skillB.show()
skillB.show()
Skill Details: • Name: data_visualization • Description: This skill is responsible for visualizing the input data by generating a bar chart. It helps in understanding the data distribution and patterns. • Version: 1.0.0 • Usage: data_visualization(input_data) • Parameters: • data (array): The input data that needs to be visualized. It should be a list of values. • Required: True • Returns:
In [14]:
Copied!
skillC.show()
skillC.show()
Skill Details: • Name: data_statistics • Description: This skill calculates the average value of the input data. It provides a basic statistical overview of the dataset. • Version: 1.0.0 • Usage: data_statistics(input_data) • Parameters: • data (array): The input data for which the average needs to be calculated. It should be a list of numerical values. • Required: True • Returns: • average (float): The average value of the input data.
In [15]:
Copied!
chained_skill = skillA + skillB > "I have a dataset with empty values. First, I want to clean the data by removing the empty values, then visualize it using a bar chart."
chained_skill = skillA + skillB > "I have a dataset with empty values. First, I want to clean the data by removing the empty values, then visualize it using a bar chart."
In [16]:
Copied!
chained_skill.show()
chained_skill.show()
Skill Details: • Name: clean_and_visualize_data • Description: This skill is responsible for cleaning the input data by removing empty values and then visualizing it by generating a bar chart. It provides a simple way to preprocess and understand data. • Version: 1.0.0 • Usage: clean_and_visualize_data(input_data) • Parameters: • data (array): The input data that needs cleaning and visualization. It should be a list of values. • Required: True • Returns: • cleaned_data (array): The cleaned data after removing empty values.
In [17]:
Copied!
internal_logic_combined_skill = skillA + skillB + skillC > "I have a dataset. I want to both visualize the data using a bar chart and calculate its average simultaneously"
internal_logic_combined_skill = skillA + skillB + skillC > "I have a dataset. I want to both visualize the data using a bar chart and calculate its average simultaneously"
In [18]:
Copied!
internal_logic_combined_skill.show()
internal_logic_combined_skill.show()
Skill Details: • Name: data_analysis • Description: This skill is responsible for cleaning the input data, visualizing it by generating a bar chart, and calculating its average. It provides a comprehensive way to preprocess, understand, and analyze data. • Version: 1.0.0 • Usage: data_analysis(input_data) • Parameters: • data (array): The input data that needs cleaning, visualization, and average calculation. It should be a list of values. • Required: True • Returns: • cleaned_data (array): The cleaned data after removing empty values. • visualization (object): The visualization of the data. • average (float): The average value of the input data.
In [19]:
Copied!
parallel_combined = skillA + skillB + skillC > "I have a dataset. I want to both visualize the data using a bar chart and calculate its average simultaneously"
parallel_combined = skillA + skillB + skillC > "I have a dataset. I want to both visualize the data using a bar chart and calculate its average simultaneously"
In [20]:
Copied!
parallel_combined.show()
parallel_combined.show()
Skill Details: • Name: data_analysis • Description: This skill is responsible for cleaning the input data, visualizing it by generating a bar chart, and calculating its average. It provides a comprehensive way to preprocess, understand, and analyze data. • Version: 1.0.0 • Usage: data_analysis(input_data) • Parameters: • data (array): The input data that needs cleaning, visualization, and average calculation. It should be a list of values. • Required: True • Returns: • cleaned_data (array): The cleaned data after removing empty values. • visualization (object): The visualization of the data. • average (float): The average value of the input data.
2 Refactor Case 3: decompose one skill into multiple¶
- decompose
In [21]:
Copied!
def create_complex_skill():
skill_json = {
'skill_name': 'data_visualization_and_statistics',
'skill_description': 'This skill is responsible for visualizing the input data using a bar chart and calculating its average simultaneously. It provides a comprehensive overview of the dataset.',
'skill_metadata': {'created_at': '2023-09-30 00:26:46', 'author': 'gongjunmin', 'updated_at': '2023-09-30 00:26:46', 'usage_count': 0, 'version': '1.0.0', 'additional_kwargs': {}},
'skill_tags': ['data visualization', 'statistics', 'bar chart'],
'skill_usage_example': 'data_visualization_and_statistics(input_data)',
'skill_program_language': 'python',
'skill_code': 'def data_visualization_and_statistics(input_data):\n visualize_data(input_data)\n calculate_average(input_data)',
'skill_parameters': [
{'param_name': 'input_data', 'param_type': 'any', 'param_description': 'The input dataset to be visualized and analyzed.', 'param_required': True, 'param_default': None}
],
'skill_return': None,
'skill_dependencies': [
{'dependency_name': 'visualize_data', 'dependency_version': '', 'dependency_type': 'built-in'},
{'dependency_name': 'calculate_average', 'dependency_version': '', 'dependency_type': 'built-in'}
]
}
skill = CodeSkill(**skill_json)
skill.skill_metadata = BaseSkillMetadata()
skill.conversation_history = []
return skill
def create_complex_skill():
skill_json = {
'skill_name': 'data_visualization_and_statistics',
'skill_description': 'This skill is responsible for visualizing the input data using a bar chart and calculating its average simultaneously. It provides a comprehensive overview of the dataset.',
'skill_metadata': {'created_at': '2023-09-30 00:26:46', 'author': 'gongjunmin', 'updated_at': '2023-09-30 00:26:46', 'usage_count': 0, 'version': '1.0.0', 'additional_kwargs': {}},
'skill_tags': ['data visualization', 'statistics', 'bar chart'],
'skill_usage_example': 'data_visualization_and_statistics(input_data)',
'skill_program_language': 'python',
'skill_code': 'def data_visualization_and_statistics(input_data):\n visualize_data(input_data)\n calculate_average(input_data)',
'skill_parameters': [
{'param_name': 'input_data', 'param_type': 'any', 'param_description': 'The input dataset to be visualized and analyzed.', 'param_required': True, 'param_default': None}
],
'skill_return': None,
'skill_dependencies': [
{'dependency_name': 'visualize_data', 'dependency_version': '', 'dependency_type': 'built-in'},
{'dependency_name': 'calculate_average', 'dependency_version': '', 'dependency_type': 'built-in'}
]
}
skill = CodeSkill(**skill_json)
skill.skill_metadata = BaseSkillMetadata()
skill.conversation_history = []
return skill
In [22]:
Copied!
skill = create_complex_skill()
skill = create_complex_skill()
In [23]:
Copied!
decomposed_skills = skill < "I want to decompose this skill into two skills: one for visualizing the data using a bar chart, and one for calculating the average."
decomposed_skills = skill < "I want to decompose this skill into two skills: one for visualizing the data using a bar chart, and one for calculating the average."
In [24]:
Copied!
len(decomposed_skills)
len(decomposed_skills)
Out[24]:
2
In [25]:
Copied!
for decomposed_skill in decomposed_skills:
decomposed_skill.show()
for decomposed_skill in decomposed_skills:
decomposed_skill.show()
Skill Details: • Name: visualize_data • Description: This skill is responsible for visualizing the input data using a bar chart. It provides a visual overview of the dataset. • Version: 1.0.0 • Usage: visualize_data(input_data) • Parameters: • input_data (any): The input dataset to be visualized. • Required: True • Returns:
Skill Details: • Name: calculate_average • Description: This skill is responsible for calculating the average of the input data. It provides a statistical analysis of the dataset. • Version: 1.0.0 • Usage: average = calculate_average(input_data) • Parameters: • input_data (any): The input dataset to be analyzed. • Required: True • Returns: • average (float): The average of the input data.
Last update:
October 31, 2023
Created: October 18, 2023
Created: October 18, 2023