Python Data Science Jobs & Interviews
20.3K subscribers
187 photos
4 videos
25 files
325 links
Your go-to hub for Python and Data Science—featuring questions, answers, quizzes, and interview tips to sharpen your skills and boost your career in the data-driven world.

Admin: @Hussein_Sheikho
Download Telegram
Q: How can reinforcement learning be used to simulate human-like decision-making in dynamic environments? Provide a detailed, advanced-level code example.

In reinforcement learning (RL), agents learn optimal behaviors through trial and error by interacting with an environment. To simulate human-like decision-making, we use deep reinforcement learning models like Proximal Policy Optimization (PPO), which balances exploration and exploitation while adapting to complex, real-time scenarios.

Human behavior involves not just reward maximization but also risk aversion, social cues, and emotional responses. We can model these using:
- State representation: Include contextual features (e.g., stress level, past rewards).
- Action space: Discrete or continuous actions mimicking human choices.
- Reward shaping: Incorporate intrinsic motivation (e.g., curiosity) and extrinsic rewards.
- Policy networks: Use neural networks to approximate policies that mimic human reasoning.

Here’s a Python example using stable-baselines3 for PPO in a custom environment simulating human decision-making under uncertainty:

import numpy as np
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy

# Define custom environment
class HumanLikeDecisionEnv(gym.Env):
def __init__(self):
super().__init__()
self.action_space = gym.spaces.Discrete(3) # [0: cautious, 1: neutral, 2: bold]
self.observation_space = gym.spaces.Box(low=-100, high=100, shape=(4,), dtype=np.float32)
self.state = None
self.reset()

def reset(self, seed=None, options=None):
self.state = np.array([np.random.uniform(-50, 50), # current reward
np.random.uniform(0, 10), # risk tolerance
np.random.uniform(0, 1), # social influence
np.random.uniform(-1, 1)]) # emotion factor
return self.state, {}

def step(self, action):
# Simulate human-like response based on action
reward = 0
if action == 0: # Cautious
reward += self.state[0] * 0.8 - np.abs(self.state[1]) * 0.5
elif action == 1: # Neutral
reward += self.state[0] * 0.9
else: # Bold
reward += self.state[0] * 1.2 + np.random.normal(0, 5)

# Update state with noise and dynamics
self.state[0] = np.clip(self.state[0] + np.random.normal(0, 2), -100, 100)
self.state[1] = np.clip(self.state[1] + np.random.uniform(-0.5, 0.5), 0, 10)
self.state[2] = np.clip(self.state[2] + np.random.uniform(-0.1, 0.1), 0, 1)
self.state[3] = np.clip(self.state[3] + np.random.normal(0, 0.2), -1, 1)

done = np.random.rand() > 0.95 # Random termination
return self.state, reward, done, False, {}

# Create environment
env = DummyVecEnv([lambda: HumanLikeDecisionEnv])

# Train PPO agent
model = PPO("MlpPolicy", env, verbose=1, n_steps=128)
model.learn(total_timesteps=10000)

# Evaluate policy
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10)
print(f"Mean reward: {mean_reward:.2f} ± {std_reward:.2f}")

This simulation captures how humans balance risk, emotion, and social context in decisions. The model learns to adapt its strategy over time—mimicking cognitive flexibility.

#ReinforcementLearning #DeepLearning #HumanBehaviorSimulation #AI #MachineLearning #PPO #Python #AdvancedAI #RL #NeuralNetworks

By: @DataScienceQ 🚀
2