You are not logged in.
Hey folks,
Looking for some help on a bash script I have been working on for quite some time.
This is a little project I started because I want to learn. I would still consider myself a newbie when it comes to Linux. I have been a Windows user for close to 30 years but work with some Linux servers as part of my job as a Systems Admin. Six months ago or so I got annoyed with Windows 11 and decided to switch to Linux as a daily driver. After lots of research and trying distro's like Pop, I landed on Arch as a daily driver because of the "DIY" approach. While it poses a challenge, it's a great opportunity to learn as you need to read the wiki quite extensively to understand the installation process and customization. After countless clean installs trying different configs, DE's, and WM's, I found a setup I like and wanted to make a bash script for myself that would get Arch installed in a few minutes rather than the more lengthy manual install. Yes, I am aware of the "archinstall" script available right in the live image but I wanted to make my own.
The issue I am having is related to user home folder not existing in /home after install completes, reboot and sign in. When I sign in as the created user the working directory is /.
What confuses me is, this only seems to happen when I select "Server" as part of my script. At the very end, the user is given the choice to pick a Desktop Environment. Option 1: Server (no DE), Option 2: GNOME, or Option 3: Plasma.
GNOME and Plasma seem to create the user folder just fine, server will not and I cannot sort out why. The whole script is on github but I'll put the sections related to my issue below. Any insight at all is appreciated!
Disk Setup
echo -ne "
+-------------------+
| Drive Preparation |
+-------------------+
"
# List Disks
fdisk -l
# Select Disk
read -p "Enter the disk (e.g., /dev/sda): " disk
# Validate Disk Path
if [ ! -b "$disk" ]; then
echo "Invalid disk: $disk. Exiting."
exit 1
fi
# Confirm Disk Selection
echo "You have selected $disk. Is this correct? (Y/n)"
read confirm
# Convert input to lowercase for easier comparison
confirm=${confirm,,}
# If the input is empty or 'y', proceed; otherwise, exit
if [ "$confirm" != "y" ] && [ -n "$confirm" ]; then
echo "Exiting."
exit 1
fi
# List current partitions
echo "Current partitions on $disk:"
fdisk -l "$disk"
# Confirm deletion of existing partitions
echo "This will delete all existing partitions on $disk. Proceed? (Y/n)"
read proceed
# Convert input to lowercase for easier comparison
proceed=${proceed,,}
# If the input is neither 'y' nor empty, exit
if [ "$proceed" != "y" ] && [ -n "$proceed" ]; then
echo "Exiting."
exit 1
fi
# Remove existing partitions and clear old signatures
echo -ne "d\nw" | fdisk "$disk" || { echo "Failed to delete partitions"; exit 1; }
dd if=/dev/zero of="$disk" bs=512 count=1 conv=notrunc || { echo "Failed to wipe disk"; exit 1; }
#shred -n 1 -v "$disk" # Overwrite the disk once with random data
#cryptsetup luksDump "$disk" # Check for any existing LUKS devices
# Create new GPT partition table and partitions with types
echo "Creating new GPT table and partitions on $disk"
(
echo "g" # Create new GPT table
echo "n" # Add new partition
echo "" # Default partition number 1
echo "" # Default to first sector
echo "+2G" # Partition 1 size 2GB
echo "t" # Change partition type
echo "1" # Set type to EFI System
echo "n" # Add new partition
echo "" # Default to partition number 2
echo "" # Default to first sector
echo "+5G" # Partition 2 size 5GB
echo "n" # Add new partition
echo "" # Default to partition number 3
echo "" # Default to first sector
echo "" # Use remaining space
echo "t" # Change partition type
echo "3" # Partition 3
echo "44" # Set type to LVM
echo "w" # Write changes
) | fdisk "$disk" || { echo "Failed to create partitions"; exit 1; }
partprobe "$disk" || { echo "Failed to re-read partition table"; exit 1; }
echo -ne "
+-------------------+
| Perform LVM setup |
+-------------------+
"
# Format partition 1 as FAT32
mkfs.fat -F32 "${disk}1" || { echo "Failed to format ${disk}1"; exit 1; }
# Format partition 2 as ext4
mkfs.ext4 "${disk}2" || { echo "Failed to format ${disk}2"; exit 1; }
# Ask user to set encryption password
read -s -p "Enter encryption password: " password
echo
# Confirm encryption password
read -s -p "Confirm encryption password: " confirm_password
echo
# Check if passwords match
if [ "$password" != "$confirm_password" ]; then
echo "Passwords do not match. Exiting."
exit 1
fi
# Setup encryption on partition 3 using LUKS
echo "$password" | cryptsetup luksFormat "${disk}3" || { echo "Failed to format LUKS partition"; exit 1; }
# Open LUKS partition
echo "$password" | cryptsetup open --type luks --batch-mode "${disk}3" lvm || { echo "Failed to open LUKS partition"; exit 1; }
# Create physical volume for LVM on partition 3 with data alignment 1m
pvcreate /dev/mapper/lvm || { echo "Failed to create physical volume"; exit 1; }
# Create volume group called volgroup0 on partition 3
vgcreate volgroup0 /dev/mapper/lvm || { echo "Failed to create volume group"; exit 1; }
# Create logical volumes
lvcreate -L 100GB volgroup0 -n lv_root || { echo "Failed to create logical volume lv_root"; exit 1; }
lvcreate -l 100%FREE volgroup0 -n lv_home || { echo "Failed to create logical volume lv_home"; exit 1; }
# Load kernel module
modprobe dm_mod
# Scan system for volume groups
vgscan
# Activate volume group
vgchange -ay || { echo "Failed to activate volume group"; exit 1; }
# Format root volume
mkfs.ext4 /dev/volgroup0/lv_root || { echo "Failed to format root volume"; exit 1; }
# Mount root volume
mount /dev/volgroup0/lv_root /mnt || { echo "Failed to mount root volume"; exit 1; }
# Create /boot directory and mount partition 2
mkdir -p /mnt/boot
mount "${disk}2" /mnt/boot || { echo "Failed to mount /boot"; exit 1; }
# Create /boot/EFI directory and mount partition 1
mkdir -p /mnt/boot/EFI
mount "${disk}1" /mnt/boot/EFI || { echo "Failed to mount /boot"; exit 1; }
# Format home volume
mkfs.ext4 /dev/volgroup0/lv_home || { echo "Failed to format home volume"; exit 1; }
# Create /home directory and mount home volume
mkdir -p /mnt/home
mount /dev/volgroup0/lv_home /mnt/home || { echo "Failed to mount /home"; exit 1; }
# Ensure /mnt/etc exists
mkdir -p /mnt/etc
echo "Setup completed successfully."
Create Username and User Password Function: This function allows the user to set the username and password
# Create username and password
newuser () {
# Loop through user input until the user gives a valid username
while true
do
read -r -p "Enter a username: " username
if [[ "${username,,}" =~ ^[a-z_]([a-z0-9_-]{0,31}|[a-z0-9_-]{0,30}\$)$ ]]
then
break
fi
echo "Invalid username."
done
export USERNAME=$username
while true
do
read -rs -p "Set a password: " PASSWORD1
echo -ne "\n"
read -rs -p "Confirm password: " PASSWORD2
echo -ne "\n"
if [[ "$PASSWORD1" == "$PASSWORD2" ]]; then
break
else
echo -ne "ERROR! Passwords do not match. \n"
fi
done
export PASSWORD=$PASSWORD1
Call Function
# Call functions
newuser
After "newuser" is called, the user is able to set timezone then a sub script is created called "chroot-setup.sh". From within that script is where the user is actaully created.
useradd section (I know I should not need to mess with /etc/skel, /etc/shadow or /etc/default/useradd because all I should need it "useradd -m" but the simplest methods would still fail to make the user home folder)
echo -ne "
+--------------------------------------------------+
| Adding user, setting passwords, setting hostname |
+--------------------------------------------------+
"
# Ensure /etc/skel exists and has correct permissions
if [ ! -d /etc/skel ]; then
echo 'Warning: /etc/skel does not exist. Creating it...'
mkdir -p /etc/skel
chmod 0755 /etc/skel
fi
# Create user
useradd -G wheel,power,storage,uucp,network -s /bin/bash $USERNAME
echo "$USERNAME:$PASSWORD" | chpasswd
# Explicitly manage /etc/skel and create home directory
cp -r /etc/skel /home/$USERNAME
chown -R $USERNAME:$USERNAME /home/$USERNAME
chmod 0700 /home/$USERNAME
echo 'Home directory created and populated from /etc/skel'
# Verify /etc/shadow update
ls -l /etc/shadow # Check before useradd
ls -l /etc/shadow # Check after useradd
echo 'User added and /etc/shadow updated'
# Validate /etc/default/useradd settings
echo '--- /etc/default/useradd settings ---'
echo "SKEL: $(grep ^SKEL= /etc/default/useradd)"
echo "HOME: $(grep ^HOME= /etc/default/useradd)"
echo "SHELL: $(grep ^SHELL= /etc/default/useradd)"
Last edited by live4thamuzik (2024-10-08 23:37:58)
Offline
although I do have several more comments I try to stay on topic:
if
useradd -m ...
fails to create and populate the new /home/$home there has to be a reason for it and likely a warning or error along explain why it failed
I doubt useradd fails silent if the home directory cannot be created
there's also a specific error return code for it: 12
so to figure out what goes wrong I would just use the regular
useradd -m ...
and check its exit code
as you mentioned your project is on github please share a link so we can have a look at it and test it
from what's given I suspect an issue earlier in the script
also: use sgdisk instead of fdisk - it's the cli version meant for use in scripts
Offline
Github link: https://github.com/live4thamuzik/ArchL4TM ( it's not pretty but I tried )
Offline
I tried swapping out sgdisk in place of fdisk but that caused more problems after being able to select the disk. As this section is working fine with fdisk (even if its not best practice), I will leave it alone for now and come back to it later. I'd like to get the home folder sorted so I can then focus on giving the user the ability to install an AUR from the script as well. I will read more on this though!
although I do have several more comments I try to stay on topic:
also: use sgdisk instead of fdisk - it's the cli version meant for use in scripts
Offline
While your "disk setup" contains some error handling, there is no error handling in "newuser".
Add something like
useradd ... || { echo "Failed useradd ..."; exit 1; }
A fixed exit 1 is also bad. Better is to report the real exit code from useradd, $?, but you need to rewrite the code a bit, not to report the exit code from the previous "echo" command.
Also it seems to be a good idea, to log all output to a logfile.txt in the new system, to check later was was done, not done, or with errors.
Offline
I will your suggestions for error handling and a log file. Thanks!
While your "disk setup" contains some error handling, there is no error handling in "newuser".
Add something like
useradd ... || { echo "Failed useradd ..."; exit 1; }A fixed exit 1 is also bad. Better is to report the real exit code from useradd, $?, but you need to rewrite the code a bit, not to report the exit code from the previous "echo" command.
Also it seems to be a good idea, to log all output to a logfile.txt in the new system, to check later was was done, not done, or with errors.
Offline
I added the error handling to useradd
first I tried
useradd, $?
Unfortunately this didn't provide anything useful in the logfile so now I am trying strace.
# Create user and home directory
echo "Tracing useradd with strace..."
strace -f -o useradd.trace useradd -m -G wheel,power,storage,uucp,network -s /bin/bash "$USERNAME"
echo "useradd trace complete."
if [[ $? -ne 0 ]]; then
log_error "Error creating user $USERNAME" $?
exit $? # Exit with the useradd exit code
fi
While your "disk setup" contains some error handling, there is no error handling in "newuser".
Add something like
useradd ... || { echo "Failed useradd ..."; exit 1; }A fixed exit 1 is also bad. Better is to report the real exit code from useradd, $?, but you need to rewrite the code a bit, not to report the exit code from the previous "echo" command.
Also it seems to be a good idea, to log all output to a logfile.txt in the new system, to check later was was done, not done, or with errors.
Offline
This will not work as expected: the $? in if will always be 0, because it refers to the last command, which is echo, which usually does not fails.
and I think you don't need strace here.
It seems to me, you need some more to read into bash basics.
Find a nice tutorial, in addition:
https://wiki.archlinux.org/title/Bash
https://archlinux.org/packages/?name=shellcheck
untested:
useradd -m -G wheel,power,storage,uucp,network -s /bin/bash "$USERNAME"
ret=$?
if [ $ret -ne 0 ]; then
log_error "Error creating user $USERNAME" $ret
exit $ret
else
echo "useradd completed."
fi
Offline
Thank you!
I have logs from running with strace, just haven't had time to read the data yet. I also have a log file from the script execution which shows useradd -m ... finished with no errors.
I will continue reading into bash basics and troubleshooting scripts. I think you're onto something with a potential issue earlier in the script.
Offline
cat <<EOF > /mnt/chroot-setup.sh
Have you tried reading the chroot-setup.sh after it has been generated? Is it what you expected? I have no reason to think otherwise besides the usual "expect the unexpected".
mount /dev/volgroup0/lv_root /mnt || { echo "Failed to mount root volume"; exit 1; }
...
mkdir -p /mnt/home
mount /dev/volgroup0/lv_home /mnt/home || { echo "Failed to mount /home"; exit 1; }
Did this work? No error? Have you tried unmountin /mnt/home and see if it somehow landed in the /mnt/home folder on lv_root?
Last edited by Awebb (2024-10-08 19:18:16)
Offline
You just jogged my memory! In of my earlier iterations of the script, I was entering chroot-setup.sh then
mkdir -p /home
mount /dev/volgroup0/lv_home /home
This would fail in chroot for either permissions or it just couldn't find the logical volume (I don't remember exactly). So I moved those tasks to happen during LVM setup. Now I have the home folder issue. I am going to comment out the deletion of chroot-setup.sh and have a look at the generated file in the installed system. I may even look at moving the creation of /home and mounting lv_home back to the chroot-setup.sh to look at the error again.
cat <<EOF > /mnt/chroot-setup.sh
Have you tried reading the chroot-setup.sh after it has been generated? Is it what you expected? I have no reason to think otherwise besides the usual "expect the unexpected".
mount /dev/volgroup0/lv_root /mnt || { echo "Failed to mount root volume"; exit 1; } ... mkdir -p /mnt/home mount /dev/volgroup0/lv_home /mnt/home || { echo "Failed to mount /home"; exit 1; }
Did this work? No error? Have you tried unmountin /mnt/home and see if it somehow landed in the /mnt/home folder on lv_root?
Offline
DING
so creating user home failed cause /home wasn't mounted - told you there has to be a reason other than useradd command failing
have you checked if the home folder comes up when you unmount the lvm which might shadows what's in /home ?
Offline
Solved.
I found my answer you're right, /home was not mounted on reboot!
You also called it when you said my issue was earlier in the script. This was missing!
genfstab -U -p /mnt >> /mnt/etc/fstab
User folder was being created the whole time, I just needed to run this to find it.
sudo mount /dev/volgroup0/lv_home /home
cd ~
DING
so creating user home failed cause /home wasn't mounted - told you there has to be a reason other than useradd command failing
have you checked if the home folder comes up when you unmount the lvm which might shadows what's in /home ?
Offline
to stay on topic and not fall into "in hindsight":
although I asked for it I haven'T looked at your code - but I'm not sure if I would had caught the missing genfstab - I just assumed your code follows the guide correctly
as for more points about your script:
the confirmation is somewhat backwards-negative logic: if you ask for a confirmation only an exact match should proceed with anything else failing - yet you scan for anything not Y or <blank> - so just hitting return counts as confirmation - even if it's just for yourself that's not how it should be done
in programming there's this:
if(flag==false)
- you don't compare against true or false - because it is error prone you miss one of the = and convert an equals check to an assignment which will always succeeds:
if(flag=false)
- if it's a boolean flag you don't compare at all - you just check it
if(flag)
if(!flag)
- fail fast: don't write code that's doing something and then later checks if everything was cool - check for give parameters first and fail fast
void f(int a, int b)
{
if(a<0)
return E_ARG;
...
}
your username loop is the same
you do
- read username
- validate
- if(validate) break the loop
- otherwise keep repeating
NOPE!
a better approach
- read input
- validate
- if(fail) continue to loop
- process
why: what you want to check for is invalid input - so check for it! - you do check for valid input and break the codeflow - that's both error prone and hard to read/understand
if your requirement is valid input you check for invalid input and reloop if the input is invalid - as the other codepath - valid input - is taken on its own
same for your password
- loop while invalid
- break loop when valid
better:
- loop
- check if not equal - loop
- process
also
echo -ne "
+--------------------------------------------------+
| Adding user, setting passwords, setting hostname |
+--------------------------------------------------+
"
why do you use "-n" but then do the line break yourself? let echo do that for you:
echo -e "+--------------------------------------------------+
| Adding user, setting passwords, setting hostname |
+--------------------------------------------------+"
// just had a look at the code
that whole chroot-script is garbage - don't do it this way
there's a proper way to automate run commands in chroot
also: why you install all these packages from within chroot instead of just give them to pacstrap?
also also: shadow and keyring are pulled in by base, kmod by any kernel, fakeroot by base-devel
calling grub-mkconfig twice
why nvidia-DKMS? you're using standard linux kernel - use the package for it: nvidia
don't call reboot within your script - if the boot fails you have no way to check the log
the overall structure is all over the place - define all your functions at the top with the bit of glue logic all at the bottom - it's really hard to read
some of your code is overall error prone
it's a start - but rather than trying to fix it I would start over again
Offline
Thanks for the feedback! I am taking your advise on starting over while trying to correct these issues to the best of my newbie ability.
Taking a more modular approach to the script by breaking it into separate files to make it neater and hopefully easier to read.
install.sh # Main installation script
functions.sh # Global functions
disk_setup.sh # Disk partitioning and LVM
packages.sh # Package installation functions
package-list.txt # List of packages to install
config.sh # System configuration functions
aur_helper.sh # AUR helper installation
To answer your question about the choice of nvidia-dkms, I don't end up keeping standard linux kernel, I always install linux-tkg (PDS) after the fact. Adding in a custom kernel to a script thats supposed to make installation faster seemed like a bad idea. Also I am fining DKMS works best for me in Hyprland. I have not updated the github repo at this time...LOTS of learning/work to do.
Offline
Script rewrite complete, all test runs have been successful so far. Repo has been updated. https://github.com/live4thamuzik/ArchL4TM
Thanks again for the feedback!
Offline
will find some minor things.
$ shellcheck *
In global_functions.sh line 83:
read -p "Enter the disk to use (e.g., /dev/sda): " disk
^--^ SC2162 (info): read without -r will mangle backslashes.
In global_functions.sh line 100:
read -p "Enter EFI partition size (e.g., 2G, 512M): " efi_size
^--^ SC2162 (info): read without -r will mangle backslashes.
In global_functions.sh line 101:
read -p "Enter boot partition size (e.g., 5G, 1G): " boot_size
^--^ SC2162 (info): read without -r will mangle backslashes.
In global_functions.sh line 217:
read -p "Enter root logical volume size (e.g., 50G, 200G): " root_lv_size
^--^ SC2162 (info): read without -r will mangle backslashes.
In global_functions.sh line 359:
read -p "Enter the number of your timezone choice from this page, or press Enter to see more timezones: " choice
^--^ SC2162 (info): read without -r will mangle backslashes.
In global_functions.sh line 745:
if ! runuser -u $USERNAME -- /bin/bash -c "
^-------^ SC2086 (info): Double quote to prevent globbing and word splitting.
Did you mean:
if ! runuser -u "$USERNAME" -- /bin/bash -c "
For more information:
https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
https://www.shellcheck.net/wiki/SC2162 -- read without -r will mangle backs...
Offline
Thank you for this. I will make adjustments and run more tests!
Offline
Changes have been applied to the latest commit.
ua4000 wrote:will find some minor things.
$ shellcheck * In global_functions.sh line 83: read -p "Enter the disk to use (e.g., /dev/sda): " disk ^--^ SC2162 (info): read without -r will mangle backslashes. In global_functions.sh line 100: read -p "Enter EFI partition size (e.g., 2G, 512M): " efi_size ^--^ SC2162 (info): read without -r will mangle backslashes. In global_functions.sh line 101: read -p "Enter boot partition size (e.g., 5G, 1G): " boot_size ^--^ SC2162 (info): read without -r will mangle backslashes. In global_functions.sh line 217: read -p "Enter root logical volume size (e.g., 50G, 200G): " root_lv_size ^--^ SC2162 (info): read without -r will mangle backslashes. In global_functions.sh line 359: read -p "Enter the number of your timezone choice from this page, or press Enter to see more timezones: " choice ^--^ SC2162 (info): read without -r will mangle backslashes. In global_functions.sh line 745: if ! runuser -u $USERNAME -- /bin/bash -c " ^-------^ SC2086 (info): Double quote to prevent globbing and word splitting. Did you mean: if ! runuser -u "$USERNAME" -- /bin/bash -c " For more information: https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ... https://www.shellcheck.net/wiki/SC2162 -- read without -r will mangle backs...
Offline